From David.Messina at sbc.su.se  Tue Dec  1 05:14:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 1 Dec 2009 11:14:40 +0100
Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem
	to be parsed
In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk>
	<50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se>
	<8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
Message-ID: <ECCDC4FE-DF46-4CF8-806F-750837DED8AA@sbc.su.se>

Hi Mick,

Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file?

In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it?

Thanks,
Dave


On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote:

> Hi Dave
> 
> Just got round to looking at this.
> 
> In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something:
> 
> --------------------- WARNING ---------------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> ---------------------------------------------------
> 
> However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace:
> 
> ------------- EXCEPTION -------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347
> STACK toplevel parse2.pl:20
> -------------------------------------
> 
> I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module.
> 
> Is this another bug report?
> 
> Thanks again for all your help
> 
> Mick
> 
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se] 
> Sent: 23 November 2009 17:46
> To: michael watson (IAH-C)
> Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed
> 
> Hi Mick,
> 
> Sure thing -- the current build from subversion is packaged up every  
> night and available here:
> http://www.bioperl.org/DIST/nightly_builds/
> 
> Just grab bioperl-live.tar.gz from there and you'll get the changes.
> 
> 
> Dave
> 
> 
> 
> 
> On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote:
> 
>> Hi Dave
>> 
>> Thanks for the hard work.
>> 
>> Trying to get the latest updates so I can use this... don't have svn  
>> on my server, tried to install it and I don't have python either,  
>> which is needed to install it.
>> 
>> I face about 3 weeks whilst my IT department sort this out, unless I  
>> can access the changes any other way?
>> 
>> Thanks
>> Mick
>> 
>> -----Original Message-----
>> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- 
>> daemon at portal.open-bio.org]
>> Sent: 20 November 2009 15:12
>> To: michael watson (IAH-C)
>> Subject: [Bug 2937] Strand in fasta35 output does not seem to be  
>> parsed
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2937
>> 
>> 
>> online at davemessina.com changed:
>> 
>>          What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>            Status|NEW                         |RESOLVED
>>        Resolution|                            |FIXED
>> 
>> 
>> 
>> 
>> ------- Comment #7 from online at davemessina.com  2009-11-20 10:12 EST  
>> -------
>> Fixed in r16394.
>> 
>> Michael, thanks for the report. Your test cases pass, but please  
>> reopen the bug
>> if needed.
>> 
>> 
>> -- 
>> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? 
>> tab=email
>> ------- You are receiving this mail because: -------
>> You reported the bug, or are watching the reporter.
> 


From e.osimo at gmail.com  Tue Dec  1 13:05:48 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Tue, 1 Dec 2009 19:05:48 +0100
Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test
Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com>

Hello everyone,
I'm trying to get the p value of a statistic made with Statistics::TTest
I cannot find this function: I can find if the null hypothesis is rejected
at a certain confidence level, but I cannot make the script show me the
actual p value.
Do you know other scripts that can do that?

Thanks
Emanuele

From cjfields at illinois.edu  Tue Dec  1 14:25:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 1 Dec 2009 13:25:03 -0600
Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov>
Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu>

I'll be adjusting the requisite parameters as indicated below.  I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it.

chris

Begin forwarded message:

> From: <utilities-announce at ncbi.nlm.nih.gov>
> Date: December 1, 2009 12:59:34 PM CST
> To: <utilities-announce at ncbi.nlm.nih.gov>
> Subject: [Utilities-announce] NCBI E-Utility Policy Change
> Reply-To: utilities-announce at ncbi.nlm.nih.gov
> 
> As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests.
>  
> The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request.
>  
> The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request.
>  
> NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities.
>  
> NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov.
>  
> Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service.
>  
> _______________________________________________
> Utilities-announce mailing list
> http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce


From maj at fortinbras.us  Tue Dec  1 21:27:06 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 21:27:06 -0500
Subject: [Bioperl-l] test test test
Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife>

MAJ

From ocarnorsk138 at gmail.com  Tue Dec  1 21:59:48 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Tue, 1 Dec 2009 23:59:48 -0300
Subject: [Bioperl-l] test test test
In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
Message-ID: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>

test test test test back


O'car Campos C.
Bioinformatics Engineering Student.
University of Talca.
Chile.


2009/12/1 Mark A. Jensen <maj at fortinbras.us>

> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From maj at fortinbras.us  Tue Dec  1 22:08:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 22:08:23 -0500
Subject: [Bioperl-l] test test test
In-Reply-To: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
	<b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
Message-ID: <CC7F9A12F9474D2BB5DC4E69190F2AE6@NewLife>

I love when people are paying attention!
  ----- Original Message ----- 
  From: Ocar Campos 
  To: Mark A. Jensen ; Bioperl Mailing List. 
  Sent: Tuesday, December 01, 2009 9:59 PM
  Subject: Re: [Bioperl-l] test test test


  test test test test back


  O'car Campos C.
  Bioinformatics Engineering Student.
  University of Talca.
  Chile.


  2009/12/1 Mark A. Jensen <maj at fortinbras.us>

    MAJ
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rtbio.2009 at gmail.com  Wed Dec  2 07:07:08 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Wed, 2 Dec 2009 13:07:08 +0100
Subject: [Bioperl-l] Remote blast
Message-ID: <c7cac1600912020407j176c83edm9f5a3d151f507bd2@mail.gmail.com>

Hello everyone,

I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a
cgi script was written which connects to NCBI blast using remote blast
program,i.e.,

The input sequence given in the html page is taken as input and Remote blast
is performed on this based on the code for Remote blast.But,I have a problem
in the Remote blast code.

My code goes like this

@compseqs=blastcode($in{'Inputseq'});

sub blastcode
{
$input1= $_[0];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
brucei[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


 while (my $input = $str->next_seq())

{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

  print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
          my $filename = $result->query_name()."\.out";
           $factory->save_output($filename);
          $factory->remove_rid($rid);
         #       open(BLASTDEBUGFILE,'>',$blastdebugfile);
  #     print BLASTDEBUGFILE "Test1  $result";
   #     close(BLASTDEBUGFILE);

     open(OUTFILE,'>',$outfile);
     print OUTFILE "Test2 $result->database_name()";
     close(OUTFILE);

    while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);

              # open(OUTFILE,'>',$outfile);
              # print OUTFILE "in while hits";
              #close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
}
# open(OUTFILE,'>',$outfile);
  #print OUTFILE $seqs[0];
 # close(OUTFILE);

return(@seqs);
}

Here in the above code,my program is able to go till the 'else' part and
writing the output file i.e.,this step.
my $filename = $result->query_name()."\.out";

But when I tried to enter in to the next while loop where I can get the
hits,the program is not entering into the while loop i.e.,

Not entering into this
while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);


Hence I am unable to get any hits for my query.
Ex:-If the query's accession number is Tb11.02.2210, I could just get a file
Tb11.02.2210.out file,it is just displaying the file name on the browser.

Please help me in solving this problem and mail me regarding any confusions.

Regards,
Roopa.

From ashvip at gmail.com  Wed Dec  2 00:24:09 2009
From: ashvip at gmail.com (Vipin Singh)
Date: Wed, 2 Dec 2009 10:54:09 +0530
Subject: [Bioperl-l] Problems with installation
Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>

Dear Sir/Madam,
I have not been able to install bioperl on my Windows 32 machine despite
repeated attempts. I have tried both Active Perl and Strwaberry perl but
both do not seem to work.
I have followed the instruction given at
-- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Please guide.
Thanks,
Vipin.
Vipin Singh,
Senior Research Fellow,
Centre for Cellular and Molecular Biology,
Hyderabad - 500007
India.
contact - 91-040-27192778

From scott at scottcain.net  Wed Dec  2 09:18:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 2 Dec 2009 09:18:37 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com>

Hello Vipin,

"do not seem to work" doesn't give us much to go on; can you tell us
what happened?

Scott


On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh <ashvip at gmail.com> wrote:
> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

From maj at fortinbras.us  Wed Dec  2 09:18:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 09:18:31 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife>

Hi Vipin--
We need some more information; your commands, error messages you received.
Thanks, 
Mark
----- Original Message ----- 
From: "Vipin Singh" <ashvip at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 12:24 AM
Subject: [Bioperl-l] Problems with installation


> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From bcantarel at som.umaryland.edu  Wed Dec  2 13:36:27 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 13:36:27 -0500
Subject: [Bioperl-l] Parsing Genbank
Message-ID: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>

Hi all,
I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.

x $cds->start
1
x $cds->end
64

How can I get the original coordinates?  Is there a command for that or will I have to just do the math?

Feature or Bug?


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore


From maj at fortinbras.us  Wed Dec  2 14:09:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:09:11 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
Message-ID: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>

Hi Brandi-
If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
ordinary Bio::Seq, that's normal.
Can you elaborate by posting your code?
cheers,
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 1:36 PM
Subject: [Bioperl-l] Parsing Genbank


> Hi all,
> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
> it changes the coordinates of things on the minus strand.
>
>
> For example, I have a sequence that has a CDS on the minus strand at it is 
> from 911 to 974.  The sequence is 974 nt.
>
> x $cds->start
> 1
> x $cds->end
> 64
>
> How can I get the original coordinates?  Is there a command for that or will I 
> have to just do the math?
>
> Feature or Bug?
>
>
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bcantarel at som.umaryland.edu  Wed Dec  2 14:29:56 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 14:29:56 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>

Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
		       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
	next F1 unless ($cds->primary_tag() eq 'CDS');
	#do something with the cds start and cds end
	}
}
	 

LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
> 
> 
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>> 
>> 
>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>> 
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>> 
>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>> 
>> Feature or Bug?
>> 
>> 
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From maj at fortinbras.us  Wed Dec  2 14:48:44 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:48:44 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife>

with fake seq data and that header, I don't get a problem:

  DB<2> x $cds->location
0  Bio::Location::Simple=HASH(0x37b1df4)
   '_end' => 974
   '_location_type' => 'EXACT'
   '_root_verbose' => 0
   '_seqid' => 'subjpool12_contig3'
   '_start' => 911
   '_strand' => '-1'

Are you using the latest BioPerl (1.6.1 or the trunk) ?
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 2:29 PM
Subject: Re: [Bioperl-l] Parsing Genbank


Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
next F1 unless ($cds->primary_tag() eq 'CDS');
###>> debugger stops here for above output

#do something with the cds start and cds end
}
}


LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 
19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 
>1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
> ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" 
> <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
>
>
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
>> it changes the coordinates of things on the minus strand.
>>
>>
>> For example, I have a sequence that has a CDS on the minus strand at it is 
>> from 911 to 974.  The sequence is 974 nt.
>>
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>>
>> How can I get the original coordinates?  Is there a command for that or will 
>> I have to just do the math?
>>
>> Feature or Bug?
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 14:39:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 13:39:40 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu>

That one's odd; the coordinates should relate back to the original sequence.  Any chance you could pass on the sequence file so we can confirm it?  you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem).

chris

On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote:

> Here is some of my code, the real code actually enters the data into a database.
> 
> 
> $in  = Bio::SeqIO->new(-file => $gbkfile,
> 		       '-format' => 'genbank');
> 
> W1:while (my $seq = $in->next_seq()) {
>  my @feats = $seq->get_all_SeqFeatures();
>  my $j = 0;
> F1:foreach $cds (@feats) {
> 	next F1 unless ($cds->primary_tag() eq 'CDS');
> 	#do something with the cds start and cds end
> 	}
> }
> 	 
> 
> LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
> ACCESSION   subjpool12_contig3
> KEYWORDS    .
> SOURCE      human metagenome
>  ORGANISM  human metagenome
>            unclassified sequences; organismal metagenomes,metagenomes.
> FEATURES             Location/Qualifiers
>     source          1..974
>                     /mol_type="genomic DNA"
>                     /isolation_source="Homo sapiens"
>                     /organism="human metagenome"
>                     /collection_date="19-Nov-2009"
>     CDS             complement(911..974)
>                     /locus_tag="subjpool12_contig3|metagene|gene_2"
>                     /translation="IRIMTVELINPYIRHVEHST"
>                     /score="2.52804"
>                     /product="hypothetical protein"
>                     /note="score=2.52804"
>                     /note="score=2.52804"
>                     /note="frame=1"
> ORIGIN
> #some sequence?.
> 
> 
> 
> 
>> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
> 
> On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
>> Hi Brandi-
>> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
>> Can you elaborate by posting your code?
>> cheers,
>> MAJ
>> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, December 02, 2009 1:36 PM
>> Subject: [Bioperl-l] Parsing Genbank
>> 
>> 
>>> Hi all,
>>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>>> 
>>> 
>>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>>> 
>>> x $cds->start
>>> 1
>>> x $cds->end
>>> 64
>>> 
>>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>>> 
>>> Feature or Bug?
>>> 
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~
>>> Brandi Cantarel, PhD
>>> Bioinformatics Analyst
>>> Institute for Genome Sciences
>>> School of Medicine
>>> University of Maryland, Baltimore
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Dec  2 15:52:28 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 15:52:28 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife>

Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
as if there is a bug. If you can provide data that can reproduce
it, as Chris suggests, we can get onto it. 
thanks MAJ
  ----- Original Message ----- 
  From: Brandi Cantarel 
  To: Mark A. Jensen 
  Sent: Wednesday, December 02, 2009 3:38 PM
  Subject: Re: [Bioperl-l] Parsing Genbank


  How can I tell what version I am using?When I use the command from the website:


  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'


  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.


  Brandi


  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:


    with fake seq data and that header, I don't get a problem:

    DB<2> x $cds->location
    0  Bio::Location::Simple=HASH(0x37b1df4)
     '_end' => 974
     '_location_type' => 'EXACT'
     '_root_verbose' => 0
     '_seqid' => 'subjpool12_contig3'
     '_start' => 911
     '_strand' => '-1'

    Are you using the latest BioPerl (1.6.1 or the trunk) ?
    MAJ
    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
    Cc: <bioperl-l at lists.open-bio.org>
    Sent: Wednesday, December 02, 2009 2:29 PM
    Subject: Re: [Bioperl-l] Parsing Genbank


    Here is some of my code, the real code actually enters the data into a database.


    $in  = Bio::SeqIO->new(-file => $gbkfile,
         '-format' => 'genbank');

    W1:while (my $seq = $in->next_seq()) {
    my @feats = $seq->get_all_SeqFeatures();
    my $j = 0;
    F1:foreach $cds (@feats) {
    next F1 unless ($cds->primary_tag() eq 'CDS');
    ###>> debugger stops here for above output

    #do something with the cds start and cds end
    }
    }


    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
    ACCESSION   subjpool12_contig3
    KEYWORDS    .
    SOURCE      human metagenome
    ORGANISM  human metagenome
              unclassified sequences; organismal metagenomes,metagenomes.
    FEATURES             Location/Qualifiers
       source          1..974
                       /mol_type="genomic DNA"
                       /isolation_source="Homo sapiens"
                       /organism="human metagenome"
                       /collection_date="19-Nov-2009"
       CDS             complement(911..974)
                       /locus_tag="subjpool12_contig3|metagene|gene_2"
                       /translation="IRIMTVELINPYIRHVEHST"
                       /score="2.52804"
                       /product="hypothetical protein"
                       /note="score=2.52804"
                       /note="score=2.52804"
                       /note="frame=1"
    ORIGIN
    #some sequence?.


      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


    ~~~~~~~~~~~~~~~~~~~~
    Brandi Cantarel, PhD
    Bioinformatics Analyst
    Institute for Genome Sciences
    School of Medicine
    University of Maryland, Baltimore

    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:


      Hi Brandi-

      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.

      Can you elaborate by posting your code?

      cheers,

      MAJ

      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>

      To: <bioperl-l at lists.open-bio.org>

      Sent: Wednesday, December 02, 2009 1:36 PM

      Subject: [Bioperl-l] Parsing Genbank


        Hi all,

        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.


        x $cds->start

        1

        x $cds->end

        64


        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?


        Feature or Bug?


        ~~~~~~~~~~~~~~~~~~~~

        Brandi Cantarel, PhD

        Bioinformatics Analyst

        Institute for Genome Sciences

        School of Medicine

        University of Maryland, Baltimore


        _______________________________________________

        Bioperl-l mailing list

        Bioperl-l at lists.open-bio.org

        http://lists.open-bio.org/mailman/listinfo/bioperl-l


    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 16:07:58 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 15:07:58 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
	<07332179362A4D53ACAA9A72AD208049@NewLife>
Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu>

One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). 

Not much we can do unless we have something to help confirm the problem.  Also might help to know the source of the genbank file itself.

chris

On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote:

> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
> as if there is a bug. If you can provide data that can reproduce
> it, as Chris suggests, we can get onto it. 
> thanks MAJ
>  ----- Original Message ----- 
>  From: Brandi Cantarel 
>  To: Mark A. Jensen 
>  Sent: Wednesday, December 02, 2009 3:38 PM
>  Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>  How can I tell what version I am using?When I use the command from the website:
> 
> 
>  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 
> 
>  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.
> 
> 
>  Brandi
> 
> 
> 
> 
>  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:
> 
> 
>    with fake seq data and that header, I don't get a problem:
> 
>    DB<2> x $cds->location
>    0  Bio::Location::Simple=HASH(0x37b1df4)
>     '_end' => 974
>     '_location_type' => 'EXACT'
>     '_root_verbose' => 0
>     '_seqid' => 'subjpool12_contig3'
>     '_start' => 911
>     '_strand' => '-1'
> 
>    Are you using the latest BioPerl (1.6.1 or the trunk) ?
>    MAJ
>    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>    Cc: <bioperl-l at lists.open-bio.org>
>    Sent: Wednesday, December 02, 2009 2:29 PM
>    Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>    Here is some of my code, the real code actually enters the data into a database.
> 
> 
>    $in  = Bio::SeqIO->new(-file => $gbkfile,
>         '-format' => 'genbank');
> 
>    W1:while (my $seq = $in->next_seq()) {
>    my @feats = $seq->get_all_SeqFeatures();
>    my $j = 0;
>    F1:foreach $cds (@feats) {
>    next F1 unless ($cds->primary_tag() eq 'CDS');
>    ###>> debugger stops here for above output
> 
>    #do something with the cds start and cds end
>    }
>    }
> 
> 
>    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
>    ACCESSION   subjpool12_contig3
>    KEYWORDS    .
>    SOURCE      human metagenome
>    ORGANISM  human metagenome
>              unclassified sequences; organismal metagenomes,metagenomes.
>    FEATURES             Location/Qualifiers
>       source          1..974
>                       /mol_type="genomic DNA"
>                       /isolation_source="Homo sapiens"
>                       /organism="human metagenome"
>                       /collection_date="19-Nov-2009"
>       CDS             complement(911..974)
>                       /locus_tag="subjpool12_contig3|metagene|gene_2"
>                       /translation="IRIMTVELINPYIRHVEHST"
>                       /score="2.52804"
>                       /product="hypothetical protein"
>                       /note="score=2.52804"
>                       /note="score=2.52804"
>                       /note="frame=1"
>    ORIGIN
>    #some sequence?.
> 
> 
> 
> 
> 
>      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> 
>    ~~~~~~~~~~~~~~~~~~~~
>    Brandi Cantarel, PhD
>    Bioinformatics Analyst
>    Institute for Genome Sciences
>    School of Medicine
>    University of Maryland, Baltimore
> 
>    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
> 
>      Hi Brandi-
> 
>      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> 
>      Can you elaborate by posting your code?
> 
>      cheers,
> 
>      MAJ
> 
>      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> 
>      To: <bioperl-l at lists.open-bio.org>
> 
>      Sent: Wednesday, December 02, 2009 1:36 PM
> 
>      Subject: [Bioperl-l] Parsing Genbank
> 
> 
> 
> 
> 
>        Hi all,
> 
>        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
> 
> 
> 
> 
> 
>        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
> 
> 
> 
>        x $cds->start
> 
>        1
> 
>        x $cds->end
> 
>        64
> 
> 
> 
>        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
> 
> 
> 
>        Feature or Bug?
> 
> 
> 
> 
> 
>        ~~~~~~~~~~~~~~~~~~~~
> 
>        Brandi Cantarel, PhD
> 
>        Bioinformatics Analyst
> 
>        Institute for Genome Sciences
> 
>        School of Medicine
> 
>        University of Maryland, Baltimore
> 
> 
> 
> 
> 
>        _______________________________________________
> 
>        Bioperl-l mailing list
> 
>        Bioperl-l at lists.open-bio.org
> 
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
>    _______________________________________________
>    Bioperl-l mailing list
>    Bioperl-l at lists.open-bio.org
>    http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Thu Dec  3 05:31:31 2009
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 Dec 2009 05:31:31 -0500
Subject: [Bioperl-l] modENCODE seeking data managers
Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com>

Hi All,

My apologies for spamming the list, but this announcement may be of
interest:


The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA
Elements; www.modencode.org) is seeking data managers to gather and curate
large scale functional genomics data sets in fly and worm. For details, see
http://blog.modencode.org/?p=350.


Lincoln

-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>

From dan.bolser at gmail.com  Thu Dec  3 06:44:40 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 11:44:40 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ?
Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>

Hi, can someone test the script here on zero length fasta / qual files?

http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ


It seems the output has an extra newline in the sequence part of the
output (which throws off scripts that rely on the 'four lines per
record' structure of the fastq (although I'm not sure if it's illegal
fastq).

Here is what I see

BEGIN
$ head one.fna
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ head one.qual
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ createFastq.plx one.fna one.qual
@FVF7ZWH02PFOVG


+FVF7ZWH02PFOVG

END


Currently I just put in a clause in the script to skip any zero length
sequences, but I think the Qual shouldn't output an extra newline like
this.


Cheers,
Dan.


--

JHB: Bioinformatics is Biology and Biology is Bioinformatics.

From biopython at maubp.freeserve.co.uk  Thu Dec  3 07:12:15 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 12:12:15 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>

On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi, can someone test the script here on zero length fasta / qual files?
>
> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>
> It seems the output has an extra newline in the sequence part of the
> output (which throws off scripts that rely on the 'four lines per
> record' structure of the fastq (although I'm not sure if it's illegal
> fastq).

Hi Dan,

The OBF consensus was FASTQ records with a zero length
sequence might be useful, and should be output as exactly
four lines (one blank sequence line, one blank quality line).
However for parsing, any number of blank lines should be OK.
http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html

I can confirm the perl script currently outputs a FASTQ file
with TWO blank lines for the sequence, giving five lines in
total for the zero length record. That does suggest a bug.
What version of BioPerl are you running?

Peter

P.S. The script is throwing away any description after the
identifier.

From dan.bolser at gmail.com  Thu Dec  3 08:07:27 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 13:07:27 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>> Hi, can someone test the script here on zero length fasta / qual files?
>>
>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>
>> It seems the output has an extra newline in the sequence part of the
>> output (which throws off scripts that rely on the 'four lines per
>> record' structure of the fastq (although I'm not sure if it's illegal
>> fastq).
>
> Hi Dan,
>
> The OBF consensus was FASTQ records with a zero length
> sequence might be useful, and should be output as exactly
> four lines (one blank sequence line, one blank quality line).
> However for parsing, any number of blank lines should be OK.
> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>
> I can confirm the perl script currently outputs a FASTQ file
> with TWO blank lines for the sequence, giving five lines in
> total for the zero length record. That does suggest a bug.
> What version of BioPerl are you running?

Hi Peter,

Basically, I'm not running the 'latest' version of BP, which is why I
asked this question of the list rather than filing a bug report. What
version are you running? ;-)

Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
for the info).


> Peter
>
> P.S. The script is throwing away any description after the
> identifier.

That's probably bad. Feel free to edit the script on the wiki. Sadly,
MediaWiki's diff features are less than optimal, so developing scripts
on the wiki isn't ideal. Anyone know how to plug git-hub into a script
apparently hosted on a wiki?

Or is git-hub basically designed to be 'wiki for code'?

I'm wondering, because with the FlaggedRevs extension you could
basically build a whole release in the wiki. Which would be fun if
nothing else!


-- 

JHP: Biology is bioinformatics and bioinformatics is biology.

From heyne at informatik.uni-freiburg.de  Thu Dec  3 08:19:51 2009
From: heyne at informatik.uni-freiburg.de (Steffen Heyne)
Date: Thu, 03 Dec 2009 14:19:51 +0100
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>
	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de>

Hello,

so I tried to fix the problem with the location. Currently it works for
me with the following changes:

LocatableSeq.pm

sub get_nse{

...

	my $ret;
	if ($self->strand() >= 0) {
		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
	} else {
		$ret = $id . $v. $char1 . $end . $char2 . $st ;
	}
	return $ret;
}

Then I recognized during the usage of $aln->remove_seq() that it cannot
remove a seq as it uses a wrong NSE to lookup sequences. I changed the
following:

SimpleAlign.pm

sub remove_seq {

...
	$id = $seq->id();
    	$start = $seq->start();
    	$end  = $seq->end();

## changed code:

	my $v = $seq->version ? '.'.$seq->version : '';
    	if ($seq->strand >=0){
		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
	} elsif ($seq->strand == -1){
		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
	}	
...

}

The above code in LocatableSeq.pm worked in the case if I read an
alignment in stockholm format and write it out in clustalw format. But
if I read an alignment in clustalw and write it out as stockholm (or
something else) it didn't worked, as the strand is not correctly set in
ClustalW::next_aln. It works with the following changes:

ClustalW.pm

sub next_aln{

...

	my ( $sname, $start, $end, $strand );	## strand added
	$strand = 0;				## new, standard = 0???
    	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
%alignments ) {
        if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
        	( $sname, $start, $end ) = ( $1, $2, $3 );
		$strand = 1;			## new			
		if ($start > $end) {		## new
       		($start, $end, $strand) = ($end, $start, -1); ##new
		}				## new
	
      }
        else {
            ( $sname, $start ) = ( $name, 1 );
            my $str = $alignments{$name};
            $str =~ s/[^A-Za-z]//g;
            $end = length($str);
        }

        my $seq = Bio::LocatableSeq->new(
            -seq   => $alignments{$name},
            -id    => $sname,
            -start => $start,
            -end   => $end,
	    -strand=> $strand			## new
        );

...

}

So I don't know if I changed things at their correct position. And I
found them only because I used certain functions. I dont know how broad
the effect of a changed NSE in LocatableSeq.pm is to other Modules and
functions. But I'm happy with my changes (so far :-)...).

Do you will change this to your proposed way in bioperl trunk?

Thanks!

steffen


Chris Fields schrieb:
> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
> 
>> Hi,
>>
>> I'm using Bioperl for my research and it is very useful! Thank you!
>>
>> Currently I have a problem with locations tags of sequences. I read in
>> seed alignments of Rfam (in stockholm format, but I think it is
>> similar to other formats).
>>
>> If the location is like:
>>
>> AB194432.1/908-846
>>
>> the start/end values are changed to
>>
>> $seq->start = 846
>> $seq->end = 908
>>
>> and therefore the new location (e.g.$seq->get_nse) is:
>>
>> AB194432.1/846-908
>>
>> The $seq->strand tag is correctly set to -1 in this case, but if the
>> alignment is written out again (clustal, stockholm,...) this strand
>> info is lost and the sequences have this "wrong" location. But this
>> information is important in respect to the sequence accession number.
>>
>> Is there a way to set the location back to the original one or is this
>> behavior desired? Any manually setting with $seq->start($val) failed
>> due to automatic checking.
>>
>> I'm using bioperl 1.6.1
>>
>> Thanks!
>>
>> steffen
> 
> This is a definite bug. We recently discussed amending the NSE format
> due to this (the subject came up over the last few months or so); it's
> fallen through the cracks.  Fortunaely it is very easy to fix (the
> relevant method is in LocatableSeq).
> 
> Does anyone have a problem with me adding this in?  It will change
> output for only those instances where the strand is -1, so
> 
> AB194432.1/908-846
> 
> would be start = 846, end = 908, strand = -1
> 
> AB194432.1/846-908
> 
> would be start = 846, end = 908, strand = 1
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
---
Steffen Heyne, Dipl.-Bioinf.
Lehrstuhl f?r Bioinformatik
Institut f?r Informatik
Albert-Ludwigs-Universit?t Freiburg
Georges-K?hler-Allee 106
79110 Freiburg, Germany

Tel: (+49) 761 203 7465
Fax: (+49) 761 203 7462
Mail: heyne at informatik.uni-freiburg.de


From cjfields at illinois.edu  Thu Dec  3 08:47:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 07:47:32 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>

Dan,

On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

> 2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
>> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>>> Hi, can someone test the script here on zero length fasta / qual files?
>>> 
>>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>> 
>>> It seems the output has an extra newline in the sequence part of the
>>> output (which throws off scripts that rely on the 'four lines per
>>> record' structure of the fastq (although I'm not sure if it's illegal
>>> fastq).
>> 
>> Hi Dan,
>> 
>> The OBF consensus was FASTQ records with a zero length
>> sequence might be useful, and should be output as exactly
>> four lines (one blank sequence line, one blank quality line).
>> However for parsing, any number of blank lines should be OK.
>> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>> 
>> I can confirm the perl script currently outputs a FASTQ file
>> with TWO blank lines for the sequence, giving five lines in
>> total for the zero length record. That does suggest a bug.
>> What version of BioPerl are you running?
> 
> Hi Peter,
> 
> Basically, I'm not running the 'latest' version of BP, which is why I
> asked this question of the list rather than filing a bug report. What
> version are you running? ;-)
> 
> Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
> for the info).

FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN).  Basically, it now parses all three FASTQ variants.  However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1.  Peter can you confirm that?

>> Peter
>> 
>> P.S. The script is throwing away any description after the
>> identifier.
> 
> That's probably bad. Feel free to edit the script on the wiki. Sadly,
> MediaWiki's diff features are less than optimal, so developing scripts
> on the wiki isn't ideal. Anyone know how to plug git-hub into a script
> apparently hosted on a wiki?
> 
> Or is git-hub basically designed to be 'wiki for code'?

It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc.  Think Soourceforge, but a lot nicer and with no ads ;>

BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric).

> I'm wondering, because with the FlaggedRevs extension you could
> basically build a whole release in the wiki. Which would be fun if
> nothing else!

I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (

chris


From biopython at maubp.freeserve.co.uk  Thu Dec  3 09:20:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:20:32 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>

On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> FASTQ parsing had undergone a major revision prior to
> 1.6.1 (the latest release in CPAN). ?Basically, it now parses
> all three FASTQ variants. ?However, Peter indicates there
> may still be a problem, and it's likely he's running 1.6.1.
> Peter can you confirm that?

I had BioPerl from SVN circa 1.6.1 (not sure if this was before
or after the release of 1.6.1 now):

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069
$ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
1.0069

If the tuples mean anything to you:

$ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
49.46.48.48.54.57
$ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
49.46.48.48.54.57

I just updated to revision 16435, and retested. I get the same
BioPerl version numbers, and the same extra blank line in the
sequence FASTQ output as Dan reported.

Peter


From cjfields at illinois.edu  Thu Dec  3 09:39:35 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 08:39:35 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
Message-ID: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>

On Dec 3, 2009, at 8:20 AM, Peter wrote:

> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> 
>> FASTQ parsing had undergone a major revision prior to
>> 1.6.1 (the latest release in CPAN).  Basically, it now parses
>> all three FASTQ variants.  However, Peter indicates there
>> may still be a problem, and it's likely he's running 1.6.1.
>> Peter can you confirm that?
> 
> I had BioPerl from SVN circa 1.6.1 (not sure if this was before
> or after the release of 1.6.1 now):
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
> 1.0069
> 
> If the tuples mean anything to you:
> 
> $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 49.46.48.48.54.57
> $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
> 49.46.48.48.54.57
> 
> I just updated to revision 16435, and retested. I get the same
> BioPerl version numbers, and the same extra blank line in the
> sequence FASTQ output as Dan reported.
> 
> Peter

Okay I will try to look into it today (it should be an easy fix).  There are two issues, correct?

1) extra blank line.
2) missing description

Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)?  Otherwise it might get lost on the mail list or wiki.

chris

From biopython at maubp.freeserve.co.uk  Thu Dec  3 09:56:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:56:39 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>

On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?
>
> 1) extra blank line.

Which seems to be a bug in BioPerl SeqIO itself.

> 2) missing description

This is just a trivial bug/omission in the wiki example,
http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ

You just need to replace this:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

With:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -description => $seq_obj->description,
             -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

Look - I seem to be learning Perl by osmosis ;)

Peter


From dan.bolser at gmail.com  Thu Dec  3 11:29:11 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:29:11 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
	<320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?

...

>> 2) missing description
>
> This is just a trivial bug/omission in the wiki example,

...

> Look - I seem to be learning Perl by osmosis ;)

Yay!


From dan.bolser at gmail.com  Thu Dec  3 11:30:44 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:30:44 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>

2009/12/3 Chris Fields <cjfields at illinois.edu>:
> Dan,
>
> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

...

>> I'm wondering, because with the FlaggedRevs extension you could
>> basically build a whole release in the wiki. Which would be fun if
>> nothing else!
>
> I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see (

I never said it would be beneficial, only that it would be fun.

http://www.mediawiki.org/wiki/Flaggedrevs


From florent.angly at gmail.com  Thu Dec  3 13:26:57 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 03 Dec 2009 10:26:57 -0800
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
	<4B17BAF7.2050604@informatik.uni-freiburg.de>
Message-ID: <4B1802F1.1040304@gmail.com>

Hi all,

Like Steffen, I've had a few burning questions too regarding 
LocatableSeq lately.

I've had an occasional issue with LocatableSeq. Most assembly-related 
modules use LocatableSeq objects. They specify the sequence start but 
not the sequence end. This works in most cases, but I've recently 
encountered very occasional error messages related to having not 
explicitely set the end of the sequence. I've been unable to put 
together a small test case to reproduce the bug easily.

My question is. If the start of the sequence is set, is it mandatory to 
set the end of the sequence? If so, then maybe the documentation needs 
to be explicit about it and maybe there needs to be a check that 
enforces that the end is set. In fact, it seems like if I provide a 
sequence and its start position, the LocatableSeq code should be able to 
automatically calculate its end, no?

Florent


Steffen Heyne wrote:
> Hello,
>
> so I tried to fix the problem with the location. Currently it works for
> me with the following changes:
>
> LocatableSeq.pm
>
> sub get_nse{
>
> ...
>
> 	my $ret;
> 	if ($self->strand() >= 0) {
> 		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
> 	} else {
> 		$ret = $id . $v. $char1 . $end . $char2 . $st ;
> 	}
> 	return $ret;
> }
>
> Then I recognized during the usage of $aln->remove_seq() that it cannot
> remove a seq as it uses a wrong NSE to lookup sequences. I changed the
> following:
>
> SimpleAlign.pm
>
> sub remove_seq {
>
> ...
> 	$id = $seq->id();
>     	$start = $seq->start();
>     	$end  = $seq->end();
>
> ## changed code:
>
> 	my $v = $seq->version ? '.'.$seq->version : '';
>     	if ($seq->strand >=0){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
> 	} elsif ($seq->strand == -1){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
> 	}	
> ...
>
> }
>
> The above code in LocatableSeq.pm worked in the case if I read an
> alignment in stockholm format and write it out in clustalw format. But
> if I read an alignment in clustalw and write it out as stockholm (or
> something else) it didn't worked, as the strand is not correctly set in
> ClustalW::next_aln. It works with the following changes:
>
> ClustalW.pm
>
> sub next_aln{
>
> ...
>
> 	my ( $sname, $start, $end, $strand );	## strand added
> 	$strand = 0;				## new, standard = 0???
>     	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
> %alignments ) {
>         if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
>         	( $sname, $start, $end ) = ( $1, $2, $3 );
> 		$strand = 1;			## new			
> 		if ($start > $end) {		## new
>        		($start, $end, $strand) = ($end, $start, -1); ##new
> 		}				## new
> 	
>       }
>         else {
>             ( $sname, $start ) = ( $name, 1 );
>             my $str = $alignments{$name};
>             $str =~ s/[^A-Za-z]//g;
>             $end = length($str);
>         }
>
>         my $seq = Bio::LocatableSeq->new(
>             -seq   => $alignments{$name},
>             -id    => $sname,
>             -start => $start,
>             -end   => $end,
> 	    -strand=> $strand			## new
>         );
>
> ...
>
> }
>
> So I don't know if I changed things at their correct position. And I
> found them only because I used certain functions. I dont know how broad
> the effect of a changed NSE in LocatableSeq.pm is to other Modules and
> functions. But I'm happy with my changes (so far :-)...).
>
> Do you will change this to your proposed way in bioperl trunk?
>
> Thanks!
>
> steffen
>
>
> Chris Fields schrieb:
>   
>> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
>>
>>     
>>> Hi,
>>>
>>> I'm using Bioperl for my research and it is very useful! Thank you!
>>>
>>> Currently I have a problem with locations tags of sequences. I read in
>>> seed alignments of Rfam (in stockholm format, but I think it is
>>> similar to other formats).
>>>
>>> If the location is like:
>>>
>>> AB194432.1/908-846
>>>
>>> the start/end values are changed to
>>>
>>> $seq->start = 846
>>> $seq->end = 908
>>>
>>> and therefore the new location (e.g.$seq->get_nse) is:
>>>
>>> AB194432.1/846-908
>>>
>>> The $seq->strand tag is correctly set to -1 in this case, but if the
>>> alignment is written out again (clustal, stockholm,...) this strand
>>> info is lost and the sequences have this "wrong" location. But this
>>> information is important in respect to the sequence accession number.
>>>
>>> Is there a way to set the location back to the original one or is this
>>> behavior desired? Any manually setting with $seq->start($val) failed
>>> due to automatic checking.
>>>
>>> I'm using bioperl 1.6.1
>>>
>>> Thanks!
>>>
>>> steffen
>>>       
>> This is a definite bug. We recently discussed amending the NSE format
>> due to this (the subject came up over the last few months or so); it's
>> fallen through the cracks.  Fortunaely it is very easy to fix (the
>> relevant method is in LocatableSeq).
>>
>> Does anyone have a problem with me adding this in?  It will change
>> output for only those instances where the strand is -1, so
>>
>> AB194432.1/908-846
>>
>> would be start = 846, end = 908, strand = -1
>>
>> AB194432.1/846-908
>>
>> would be start = 846, end = 908, strand = 1
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at illinois.edu  Thu Dec  3 23:16:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 22:16:48 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu>


On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote:

> 2009/12/3 Chris Fields <cjfields at illinois.edu>:
>> Dan,
>> 
>> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:
> 
> ...
> 
>>> I'm wondering, because with the FlaggedRevs extension you could
>>> basically build a whole release in the wiki. Which would be fun if
>>> nothing else!
>> 
>> I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (
> 
> I never said it would be beneficial, only that it would be fun.
> 
> http://www.mediawiki.org/wiki/Flaggedrevs

Ah, okay, that makes some sense.  

Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue.

chris

From rtbio.2009 at gmail.com  Fri Dec  4 08:57:21 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 4 Dec 2009 14:57:21 +0100
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
Message-ID: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>

Hello all,

I am working on Remote blast.Here,I am trying to get 2 parameters into the
remote blast code.They are

1.The input sequence that has to be sent to blast

2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
etc.,)

When I tried to take the organism parameter as an input from the
user,through a web page,the Remote blast was not giving any results i.e., it
says that there are no alignments found.

But,when I hard coded the organism in the code,it gives me the results i.e.,
3hits.

I could not understand this problem.Could any body please help me in this
regard?

My code is

sub blastcode
{

$input1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
               print OUTFILE @params;
              close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-Organism' => $organism );

while (my $input = $str->next_seq())

{
#Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

   # my $r = $factory->submit_blast('amino.fa');

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

      #    open(BLASTDEBUGFILE,'>',$debugfile);
       #   print BLASTDEBUGFILE $result->next_hit();
        #  close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);
$factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);
}

Regards,
Roopa.

From cjfields at illinois.edu  Fri Dec  4 09:59:17 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 4 Dec 2009 08:59:17 -0600
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
In-Reply-To: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
References: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu>

Roopa,

At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports).  See here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155

Also, are the returned hits specific for the genome?  You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why):

http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi

chris 
 
On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I am working on Remote blast.Here,I am trying to get 2 parameters into the
> remote blast code.They are
> 
> 1.The input sequence that has to be sent to blast
> 
> 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
> etc.,)
> 
> When I tried to take the organism parameter as an input from the
> user,through a web page,the Remote blast was not giving any results i.e., it
> says that there are no alignments found.
> 
> But,when I hard coded the organism in the code,it gives me the results i.e.,
> 3hits.
> 
> I could not understand this problem.Could any body please help me in this
> regard?
> 
> My code is
> 
> sub blastcode
> {
> 
> $input1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $input1;
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE @params;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
> '-Organism' => $organism );
> 
> while (my $input = $str->next_seq())
> 
> {
> #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>   my $r = $factory->submit_blast($input);
> 
>   # my $r = $factory->submit_blast('amino.fa');
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
> 
>     foreach my $rid ( @rids ) {
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>      #    open(BLASTDEBUGFILE,'>',$debugfile);
>       #   print BLASTDEBUGFILE $result->next_hit();
>        #  close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> $factory->save_output($filename);
> 
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> }
> 
> Regards,
> Roopa.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Fri Dec  4 13:27:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Fri, 4 Dec 2009 13:27:38 -0500
Subject: [Bioperl-l] Gene critical region analysis -- visual display
Message-ID: <deaa866a0912041027r71c49f58n7d467f050c2f49c6@mail.gmail.com>

Background:
I have been involved in aging research off and on for ~16 years.  My initial
focus was in the eventual decline of the "program" (because DNA has no ECC
and only limited redundancy) therefore my initial work (in the early 1990's
was focused on DNA repair genes (of which there about 150 in the human
genome) [1,2].  Most recently I have focused in on the DNA double strand
break repair processes (NHEJ) as a fundamental cause of aging because it may
fundamentally corrupt the genomes of individual cells.  (And as most
programmers would agree -- break the code and you break the program).
 Michael Lieber at UCLA has estimated that by the time a human is ~70 on the
order of several hundred genes in ones cells have been corrupted (which may
be an
indeterminate effect on the cells functioning).

Problem:
Just looking at the GenBank output for the human Artemis (DCLRE1C) gene
there are on the order of 18 SNPs and 8 possible phosphorylation sites (not
to mention other potential modification sites) -- this combined with the
fact that Methionine and Tryptophan and to a lesser extent Cysteine are more
susceptible to single base mutations (due the alteration of the codon->amino
acid coding even involving single base mutations/repairs) . There are
various programs to analyze such proteins for the critical sites -- SIFT and
the various programs pointed to by their sites.  Now it seems to me that one
could attack this problem by integrating SNPs, mutations, etc. at the
critical sites (where "critical" may or may not be at normal SNPs -- which
presumably are primarily at non-critical sites -- and those proteins where
if you change the coding sequence to non-synomonous amino acids you
potentially break the protein (the real interpretation of which will not be
understood until population studies are done).

So, in the process of looking at the DCLRE1C protein I asked myself, "Why is
there not a BioPerl function which simply enables a visual interpretation of
the critical sites of the protein?"  I.e. some color-coded representation of
the protein (which presumably has some augmented functionality to determine
things like probability or statistical information).  I.e. hand the function
a .fasta file and it will give you an visual (colored) analysis of the
critical nature of specific a.a. -- i.e. something which could be used by
genomic or SNP analysis (such as I presume that being done by 23andme -- as
well as other organizations) to begin to separate out the variations in the
human genome (e.g. SNPs) from the mutations which may effect individuals.

I have the C programming and to a lesser extent Perl experience to
contribute to this -- I lack the BioPerl wisdom to make it generally
available.

If anyone has some suggestions as to what functions/modules might be of use
(in providing a "single-look" view of gene a.a. whose mutations may be more
or less detrimental) I would appreciate hearing from them.

Robert Bradbury

1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press
(2006)
2. "Aging of the Genome",  J. Vijg, Oxford University Press (2007)

From maj at fortinbras.us  Sun Dec  6 17:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 6 Dec 2009 17:54:00 -0500
Subject: [Bioperl-l] bioperl-mode new feature: base class browsing
Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife>

Hi All, 
You can now browse pod of the base/parent classes of bioperl modules
with one keystroke using the latest update of bioperl-mode.
See http://bioperl.org/wiki/Emacs_bioperl-mode
Press "B" or "P" while in pod view to get a completion list 
of the parent classes for the module whose pod you're viewing.
cheers, 
MAJ


From mmokrejs at ribosome.natur.cuni.cz  Mon Dec  7 15:33:48 2009
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Mon, 07 Dec 2009 21:33:48 +0100
Subject: [Bioperl-l] Generalized reciprocal blast
In-Reply-To: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
References: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz>

Hi,
  I just stumbled across this older posting ... maybe you want to exploit
SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has
remote API available.
Martin

Robert Bradbury wrote:
> I would like to know whether or not anyone has attempted to create a
> "generalized" reciprocal blast component for BioPerl?
> 
> One sees papers all the time where they discuss running reciprocal blasts to
> compare a new species to an old "standard" species or a set of species or
> running an all-to-all set of comparisons to match up all of the "known"
> proteins from species and determine which are outliers (and therefore
> "novel").  There are also accumulating merged sets in NCBI HomoloGene (which
> seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes)
> and Ensembl (which seems to be working with a much larger set of 40-50
> genomes some of which may be somewhat incomplete and are certainly poorly
> "explored".
> 
> I have, I believe, seen code "fragments" from various authors, perhaps some
> on the BioPerl list, which perform some major subset of a typical
> "reciprocal blast".
> 
> Now what I am looking for is a relatively generalizable some-to-some
> reciprocal blast utility.  I want to be able to specify the genes (or gene
> family), e.g. some of the ~150 known DNA repair genes.  It would be helpful
> to also specify how "tolerant" the blast "true reciprocal" criteria are.
> There are some genes where there is a very strict 1-to-1 relationship across
> many genomes.  But for genes which involve relatively standard domains, e.g.
> "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for
> example its more like 5-to-5 and it would be really nice to be able to
> specify the strictness or quality level [1] for "matching" genes (and even
> which genes are to be excluded because they are known to be false
> homologues).
> 
> Then to top this off I want to be able to combine known public e.g.
> (HomoloGene / Uniigene / Ensembl) databases with perhaps local private
> databases or database subsets (e.g. emerging or specialized genomes).
> 
> The goal here of course to determine the precise phylogenetic relationships
> between all of the DNA repair genes and how there may be gain / loss /
> evolution of function that can be related to species characteristics (size,
> longevity, etc.).
> 
> Is there a generalized reciprocal blast component in BioPerl?  Or is it a
> "build-it-yourself" situation (that I have to believe has been built
> probably a few dozen times by various researchers / organizations /
> companies)?
> 
> Thanks,
> Robert Bradbury
> 
> 1. This would be handled in BioPerl with a customizable user function which
> could be tailored to handle specific cases -- for example a function which
> when handed a set of 100 potential "matches" could go through those 100
> matches, identify common domains, and then "re-rate" matches based on
> considerations such as the type and number of common domains, domains being
> in the same order, etc.  I.e. criteria which may be difficult to completely
> generalize across entire genomes but are fairly obvious if you are looking
> at a graphical replication of a gene set in HomoloGene.

From robert.bradbury at gmail.com  Mon Dec  7 15:41:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 7 Dec 2009 15:41:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions
Message-ID: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>

This comment could also have a subject line: "Why does Bioperl/get_sequence>
fork at all!  Why are not all operations sequential?  And if this is a
"default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
script if I have little or no capability of what the program uses when it
runs?  I may have days so I can bear the burden of relatively slow results
(and so can use sequential processing rather than parallel).

I've got a perl script that uses remote blast to blast a sequence against a
subset of the NCBI sequences.  It "mostly" works, in that it returns a
seemingly complete .bls result file but when attempting to look at the
sequences (so it can more accurately summarize the information from the
results than a standard blast report allows) it terminates prematurely with
errors.

The error is:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Couldn't fork: Resource temporarily unavailable
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::WebDBSeqI::_open_pipe
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
STACK: Bio::Perl::get_sequence
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
STACK: /home/bradbury/Genomes/bin/RB.pl:155
-----------------------------------------------------------

The precise line (in my code) whcih appears to be generating the error is:
    $seq = get_sequence('GenBank', $accsn);

Now this can be a problem if NCBI/Genbank fails due to load conditions --
but this specific failure (which is repeatable is due to most likely hitting
the user process limit restrictions) -- but the small blast results work
fine -- its only if the Blast has returned several hundred hits that it runs
into this problem.

Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
queries (to get a sequence) with complete disregard of the environment
(process limits, NCBI limits, etc.).  But I do not know enough about how
this works to point a finger at some specific function.  As a result
get_sequence process results are accumulated, summarized, etc. without ever
having issued to respect "wait-variant()) calls to collect former children
[This IMO would clearly be a bug.]

It could be adjusted to by allowing the BioPerl library to run in 3 modes.
 (1) completely synchronous -- if you fork you wait until its done -- and
you collect "it" and any fork fails then one either collects the process or
switches to the non-conservative mode.

Robert

From cjfields at illinois.edu  Mon Dec  7 16:08:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 7 Dec 2009 15:08:40 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <A36A88C9-D94C-4559-A629-56EB8F374DAC@illinois.edu>

Robert, 

If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not.  All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly.

See the POD for those specific modules for more information.

chris

On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
> script if I have little or no capability of what the program uses when it
> runs?  I may have days so I can bear the burden of relatively slow results
> (and so can use sequential processing rather than parallel).
> 
> I've got a perl script that uses remote blast to blast a sequence against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from the
> results than a standard blast report allows) it terminates prematurely with
> errors.
> 
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
> 
> The precise line (in my code) whcih appears to be generating the error is:
>    $seq = get_sequence('GenBank', $accsn);
> 
> Now this can be a problem if NCBI/Genbank fails due to load conditions --
> but this specific failure (which is repeatable is due to most likely hitting
> the user process limit restrictions) -- but the small blast results work
> fine -- its only if the Blast has returned several hundred hits that it runs
> into this problem.
> 
> Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc. without ever
> having issued to respect "wait-variant()) calls to collect former children
> [This IMO would clearly be a bug.]
> 
> It could be adjusted to by allowing the BioPerl library to run in 3 modes.
> (1) completely synchronous -- if you fork you wait until its done -- and
> you collect "it" and any fork fails then one either collects the process or
> switches to the non-conservative mode.
> 
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec  7 16:24:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Dec 2009 13:24:54 -0800
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>

Robert -

You seem to be mixing the blast remote and the sequence query  
retrieval problems. These messages are related to the remote retrieval  
of sequences.
  It is hard to tell from your message specifically which modules you  
are using or how you are querying NCBI - there are several ways to do  
this either with the NCBI tools or the Bio::DB::GenBank.
  If you are using Bio::DB::Query::GenBank that allows for async  
access and has built in controls to adhere to the wait variant that  
NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method  
does any sort of thing (at least when it was originally written).

I always advocate if you want highly available and reliable access to  
sequences you should download the nr or whichever DB and use the local  
indexing tools for the retrieval.  Once you start doing hundreds of  
queries I don't see any good reason to be doing the query against NCBI  
directly given unreliabilities of the web and services. Local  
databases are faster and more reliable for most people so I urge you  
take advantage of the tools which provide local database access with  
the same APIs.


I would like to comment that the tone of your posts to the list are  
not particularly helpful.   I wonder if you are actually asking for  
help or just interested in complaining about when things don't work as  
you expect? This is a collaborative and volunteer-only project, with  
the principles of working together to make useful toolkit.  We  
encourage you to build programs and applications from this base that  
suit your needs, but not all things will be directly implemented in  
the toolkit if they aren't generic enough (at least that is my  
feeling, the other Core devs help with these decisions).
   If there is a useful, generic, and reusable part we would like that  
to be part of the API. Otherwise we suggest the new application that  
fits a developer's vision. We encourage you to write (and publish)  
that application separately, but certainly encourage bug (and fixes)  
submissions and also code contributions for new features where they  
can be seen as generally useful.

-jason
On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/ 
> get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable  
> BioPerl
> script if I have little or no capability of what the program uses  
> when it
> runs?  I may have days so I can bear the burden of relatively slow  
> results
> (and so can use sequential processing rather than parallel).
>
> I've got a perl script that uses remote blast to blast a sequence  
> against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from  
> the
> results than a standard blast report allows) it terminates  
> prematurely with
> errors.
>
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
>
> The precise line (in my code) whcih appears to be generating the  
> error is:
>    $seq = get_sequence('GenBank', $accsn);
>
> Now this can be a problem if NCBI/Genbank fails due to load  
> conditions --
> but this specific failure (which is repeatable is due to most likely  
> hitting
> the user process limit restrictions) -- but the small blast results  
> work
> fine -- its only if the Blast has returned several hundred hits that  
> it runs
> into this problem.
>
> Now what it sounds like to me is an attempt to do multiple  
> asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about  
> how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc.  
> without ever
> having issued to respect "wait-variant()) calls to collect former  
> children
> [This IMO would clearly be a bug.]
>
> It could be adjusted to by allowing the BioPerl library to run in 3  
> modes.
> (1) completely synchronous -- if you fork you wait until its done --  
> and
> you collect "it" and any fork fails then one either collects the  
> process or
> switches to the non-conservative mode.
>
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From Jonas_Schaer at gmx.de  Tue Dec  8 10:21:58 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Tue, 8 Dec 2009 16:21:58 +0100
Subject: [Bioperl-l] fasta format
Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>

Hi there,
I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!).

------------- EXCEPTION -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #49 '
..' is 28 != 101 chars.
STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771
STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
STACK main::readfasta blast_eval.pm:174
STACK toplevel blast_eval.pm:83
-------------------------------------

indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/
DB/Fasta.pm line 1054.


Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ?

Thanks in advance for any help! 

Regards, Jonas

From awitney at sgul.ac.uk  Tue Dec  8 12:01:58 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 8 Dec 2009 17:01:58 +0000
Subject: [Bioperl-l] package to associate genes with branches on trees?
Message-ID: <DB3D347F-EB9E-4A59-87D2-3E1A5FACF154@sgul.ac.uk>

Hi,

I have been generating some trees with Phylip (pars) and then  
processing them with Bioperl. These trees are generated by comparing  
multiple strains of a bacterial organism by presence/absence (0/1)  
calls for each gene.

I was wondering of there was any package in Bioperl to try to  
determine if any specific genes were associated with specific branches  
of the trees? Or if anyone knew of another tool that can do this?

thanks for any help

adam

From jason at bioperl.org  Tue Dec  8 12:44:43 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 8 Dec 2009 09:44:43 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>

you can run
sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or  
that is installed when you install the Bioperl scripts)
$ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa
# rename it back
$ mv yournewfile.fa yourfile.fa

or
$ sreformat fasta yourfile.fa > yournewfile.fa
$ mv yournewfile.fa yourfile.fa


-jason
On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:

> Hi there,
> I have a little question concerning bioperl. I have  
> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read  
> in some fasta files. first it worked fine, but now i have some  
> fastafiles in slightly different format (not all lines have the same  
> length!).
>
> ------------- EXCEPTION -------------
> MSG: Each line of the fasta entry must be the same length except the  
> last.
>    Line above #49 '
> ..' is 28 != 101 chars.
> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ 
> Fasta.pm:771
> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
> STACK main::readfasta blast_eval.pm:174
> STACK toplevel blast_eval.pm:83
> -------------------------------------
>
> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ 
> site/lib/Bio/
> DB/Fasta.pm line 1054.
>
>
> Is there any way to use these fasta files with diffrent length of  
> lines with this fasta.pm module or will i have to change the format  
> of my fasta-files(big databases...) ?
>
> Thanks in advance for any help!
>
> Regards, Jonas
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Tue Dec  8 23:30:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 8 Dec 2009 22:30:26 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference
Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu>

All,

For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego.  This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13.  The exact day and time is somewhat flexible depending on attendees' schedules.

For those interested, sign up here:

http://www.bioperl.org/wiki/GMOD_2010_Meeting

For those interested in attending the GMOD meeting or PAG:

http://gmod.org/wiki/January_2010_GMOD_Meeting

I can envision the following items popping up:

* Refactoring of Alignment and GFF3/FeatureIO
* Addressing BioPerl's monolithic nature
* Moose and Perl 6
* Documentation

Any others?

chris

From akarger at CGR.Harvard.edu  Wed Dec  9 10:01:45 2009
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 9 Dec 2009 10:01:45 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>

> Is there any way to use these fasta files with diffrent length of
> lines with this fasta.pm module or will i have to change the format
> of my fasta-files(big databases...) ?
> 

Jonas,

It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back.

To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy).

The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like).

Let me know if you have a problem.

-Amir Karger
Life Sciences Research Computing, FAS IT
Harvard University


From Kevin.M.Brown at asu.edu  Wed Dec  9 10:26:22 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 9 Dec 2009 08:26:22 -0700
Subject: [Bioperl-l] fasta format
In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>

Even easier to accomplish in one step. Read in the fasta file and output
it right to another fasta file with SeqIO

my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
while (my $seq = $in->next){$out->write_seq($seq);}

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, December 09, 2009 8:02 AM
> To: Jonas Schaer; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> > Is there any way to use these fasta files with diffrent length of
> > lines with this fasta.pm module or will i have to change the format
> > of my fasta-files(big databases...) ?
> > 
> 
> Jonas,
> 
> It's not Bioperl, but for a quick fix you can use the 
> Scriptome. Use the change_fasta_to_tab script 
> (http://sysbio.harvard.edu/csb/resources/computational/scripto
> me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> format__change_fasta_to_tab_) to change your FASTA into a 
> tab-delimited file. Then use the next tool 
> (change_tab_to_fasta) to change your files back.
> 
> To use a tool: change the input and output file names on the 
> website, then cut and paste the Perl script from the green 
> box into a CMD window. The script works one sequence at a 
> time, so it doesn't need a lot of memory. (As long as you 
> have enough disk space to store the tab-delimited copy).
> 
> The recreated FASTAs will be 60 characters per line (although 
> you can hand-edit the line after you paste it to be whatever 
> number of characters you'd like).
> 
> Let me know if you have a problem.
> 
> -Amir Karger
> Life Sciences Research Computing, FAS IT
> Harvard University
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Russell.Smithies at agresearch.co.nz  Wed Dec  9 14:44:41 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 10 Dec 2009 08:44:41 +1300
Subject: [Bioperl-l] fasta format
In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
	<1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>

It's even easier as the script is already written for you :-)

bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa


--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
> Sent: Thursday, 10 December 2009 4:26 a.m.
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> Even easier to accomplish in one step. Read in the fasta file and output
> it right to another fasta file with SeqIO
> 
> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
> while (my $seq = $in->next){$out->write_seq($seq);}
> 
> Kevin Brown
> Center for Innovations in Medicine
> Biodesign Institute
> Arizona State University
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> > Sent: Wednesday, December 09, 2009 8:02 AM
> > To: Jonas Schaer; bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] fasta format
> >
> > > Is there any way to use these fasta files with diffrent length of
> > > lines with this fasta.pm module or will i have to change the format
> > > of my fasta-files(big databases...) ?
> > >
> >
> > Jonas,
> >
> > It's not Bioperl, but for a quick fix you can use the
> > Scriptome. Use the change_fasta_to_tab script
> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> > format__change_fasta_to_tab_) to change your FASTA into a
> > tab-delimited file. Then use the next tool
> > (change_tab_to_fasta) to change your files back.
> >
> > To use a tool: change the input and output file names on the
> > website, then cut and paste the Perl script from the green
> > box into a CMD window. The script works one sequence at a
> > time, so it doesn't need a lot of memory. (As long as you
> > have enough disk space to store the tab-delimited copy).
> >
> > The recreated FASTAs will be 60 characters per line (although
> > you can hand-edit the line after you paste it to be whatever
> > number of characters you'd like).
> >
> > Let me know if you have a problem.
> >
> > -Amir Karger
> > Life Sciences Research Computing, FAS IT
> > Harvard University
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Wed Dec  9 15:18:08 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 9 Dec 2009 15:18:08 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
	<18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife>

$ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, 
">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas

----- Original Message ----- 
From: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
To: "'Kevin Brown'" <Kevin.M.Brown at asu.edu>; <bioperl-l at bioperl.org>
Sent: Wednesday, December 09, 2009 2:44 PM
Subject: Re: [Bioperl-l] fasta format


> It's even easier as the script is already written for you :-)
>
> bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa
>
>
> --Russell
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
>> Sent: Thursday, 10 December 2009 4:26 a.m.
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] fasta format
>>
>> Even easier to accomplish in one step. Read in the fasta file and output
>> it right to another fasta file with SeqIO
>>
>> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
>> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
>> while (my $seq = $in->next){$out->write_seq($seq);}
>>
>> Kevin Brown
>> Center for Innovations in Medicine
>> Biodesign Institute
>> Arizona State University
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
>> > Sent: Wednesday, December 09, 2009 8:02 AM
>> > To: Jonas Schaer; bioperl-l at bioperl.org
>> > Subject: Re: [Bioperl-l] fasta format
>> >
>> > > Is there any way to use these fasta files with diffrent length of
>> > > lines with this fasta.pm module or will i have to change the format
>> > > of my fasta-files(big databases...) ?
>> > >
>> >
>> > Jonas,
>> >
>> > It's not Bioperl, but for a quick fix you can use the
>> > Scriptome. Use the change_fasta_to_tab script
>> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
>> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
>> > format__change_fasta_to_tab_) to change your FASTA into a
>> > tab-delimited file. Then use the next tool
>> > (change_tab_to_fasta) to change your files back.
>> >
>> > To use a tool: change the input and output file names on the
>> > website, then cut and paste the Perl script from the green
>> > box into a CMD window. The script works one sequence at a
>> > time, so it doesn't need a lot of memory. (As long as you
>> > have enough disk space to store the tab-delimited copy).
>> >
>> > The recreated FASTAs will be 60 characters per line (although
>> > you can hand-edit the line after you paste it to be whatever
>> > number of characters you'd like).
>> >
>> > Let me know if you have a problem.
>> >
>> > -Amir Karger
>> > Life Sciences Research Computing, FAS IT
>> > Harvard University
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kellert at ohsu.edu  Wed Dec  9 19:36:13 2009
From: kellert at ohsu.edu (Tom Keller)
Date: Wed, 9 Dec 2009 16:36:13 -0800
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>

Greetings,
Is there a simple way to map a list of ensembl ids to the NCBI gis?

thanks,
Tom

Thomas (Tom) Keller
kellert at ohsu.edu
503.494.2442
6339b R Jones Hall (BSc/CROET)
www.ohsu.edu/xd/research/research-cores/dna-analysis/


From cjfields at illinois.edu  Wed Dec  9 20:59:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 9 Dec 2009 19:59:37 -0600
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu>

Tom,

Probably best to do this via BioMart:

http://www.ensembl.org/biomart/

I would assume you can also do this via the ensembl perl API as well.

Also, have a look at the UniProt ID Mapper:

http://www.uniprot.org/?tab=mapping

chris

On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:

> Greetings,
> Is there a simple way to map a list of ensembl ids to the NCBI gis?
> 
> thanks,
> Tom
> 
> Thomas (Tom) Keller
> kellert at ohsu.edu
> 503.494.2442
> 6339b R Jones Hall (BSc/CROET)
> www.ohsu.edu/xd/research/research-cores/dna-analysis/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lovebaby39 at gmail.com  Thu Dec 10 09:22:14 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 22:22:14 +0800
Subject: [Bioperl-l] about bioperl issue
Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC>

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS');
my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2);
my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------ 
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R20080801-1.seq.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091210/0431bad7/attachment.txt>

From SMarkel at accelrys.com  Thu Dec 10 09:47:36 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 10 Dec 2009 06:47:36 -0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net>

Reginald,

I didn't see anything highlighted in red but the three strings in the
pairwise alignment display can be obtained from an HSP using

    $hsp->query_string()
    $hsp->hit_string()
    $hsp->homology_string()

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh
Sent: Thursday, 10 December 2009 6:22 AM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] about bioperl issue
Importance: High

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh


From David.Messina at sbc.su.se  Thu Dec 10 10:09:31 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:09:31 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>

Hi Reginald,

None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.

Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.


Dave


From David.Messina at sbc.su.se  Thu Dec 10 10:36:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:36:49 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>

Hi Reginald,

Please keep all replies on the list so that everyone can follow the thread.

In a separate email, Scott gave the answer you were looking for,  I think.

Namely:
   $hsp->query_string()
OR
   $hsp->hit_string()


Dave


On Dec 10, 2009, at 16:31, Hsueh wrote:

> Dear Dave Messina
> 
> I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
> 
> Thank you
> 
> Reginald Hsueh
> 
> ------------------------------------------------------------------------------------------------------------------------------
> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
>                  |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
> ------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> --------------------------------------------------
> From: "Dave Messina" <David.Messina at sbc.su.se>
> Sent: Thursday, December 10, 2009 11:09 PM
> To: "Hsueh" <lovebaby39 at gmail.com>
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] about bioperl issue
> 
>> Hi Reginald,
>> 
>> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.
>> 
>> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.
>> 
>> 
>> Dave


From lovebaby39 at gmail.com  Thu Dec 10 10:53:00 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 23:53:00 +0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <AEA3314B45B14452A4BD1E3A2235AA5D@SHAPC>

Dear Dave Messina

Thank you for your replies.

Reginald Hsueh

--------------------------------------------------
From: "Dave Messina" <David.Messina at sbc.su.se>
Sent: Thursday, December 10, 2009 11:36 PM
To: "Hsueh" <lovebaby39 at gmail.com>
Cc: <bioperl-l at bioperl.org>
Subject: Re: [Bioperl-l] about bioperl issue

> Hi Reginald,
>
> Please keep all replies on the list so that everyone can follow the 
> thread.
>
> In a separate email, Scott gave the answer you were looking for,  I think.
>
> Namely:
>   $hsp->query_string()
> OR
>   $hsp->hit_string()
>
>
>
> Dave
>
>
>
>
> On Dec 10, 2009, at 16:31, Hsueh wrote:
>
>> Dear Dave Messina
>>
>> I need to get the string that is 
>> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
>>
>> Thank you
>>
>> Reginald Hsueh
>>
>> ------------------------------------------------------------------------------------------------------------------------------
>> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>> 206
>>                  |||||| ||||||||||||||||||    |||| || |||||| 
>> |||||||||||| ||
>> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 
>> 173
>> ------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>> --------------------------------------------------
>> From: "Dave Messina" <David.Messina at sbc.su.se>
>> Sent: Thursday, December 10, 2009 11:09 PM
>> To: "Hsueh" <lovebaby39 at gmail.com>
>> Cc: <bioperl-l at bioperl.org>
>> Subject: Re: [Bioperl-l] about bioperl issue
>>
>>> Hi Reginald,
>>>
>>> None of the words in your email or the attachment are colored red ? 
>>> unfortunately any kind of formatting tends to get removed from emails 
>>> send to mailing lists.
>>>
>>> Could you be more specific about what part of the blast report you are 
>>> not able to get? You could even just copy and paste that particular bit 
>>> of the report into your reply if it's not clear what to call it.
>>>
>>>
>>> Dave


>>>>Dear
>>>>
>>>>The following is code.
>>>>
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>my at params_rb = ( 'program'  => 'blastn',
>>>>            'database' => 'DB\\RB_GUS\\RB_GUS');
>>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);
>>>>
>>>>my $input_rb = Bio::Seq->new(-id  =>"test_query",
>>>>                       -seq => $testline2);
>>>>my $blast_report_rb = $factory_rb->blastall($input_rb);
>>>>
>>>> while (my $result_rb =  $blast_report_rb-> next_result ) {
>>>>  while (my $hit_rb = $result_rb->next_hit()){
>>>>   while (my $hsp_rb = $hit_rb->next_hsp()){
>>>>    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " 
>>>> , $hsp_rb->score , "\n" ;
>>>>    #print " ",$hit->name,"\n";
>>>>   }
>>>>  }
>>>> }
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>
>>>>I know how to get "name", "evalue" and  "score", but I don't know how 
>>>>to get the word which is in red color. (or please see attachment.)
>>>>------------------------------------------------------------------------------------------------------------------
>>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>>>>206
>>>>                   |||||| ||||||||||||||||||    |||| || |||||| 
>>>> |||||||||||| ||
>>>>Sbjct: 114 
>>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
>>>>------------------------------------------------------------------------------------------------------------------
>>>>
>>>>I will appreciate if you could tell me how to do it.
>>>>Thank you.
>>>>
>>>>Reginald Hsueh 


From pg4 at sanger.ac.uk  Thu Dec 10 15:50:40 2009
From: pg4 at sanger.ac.uk (Pablo Marin-Garcia)
Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT)
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
References: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
Message-ID: <alpine.DEB.1.10.0912102042180.8440@deskpro17122.dynamic.sanger.ac.uk>


If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) 
please read this recent thread at ensembl-dev:

http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html

Seems that the ensembl gene mapping to NCBI is done through translation so 
the noncoding genes do not have the corresponding NCBI gene mapped.


   -Pablo


> ------------------------------
>
> Message: 4
> Date: Wed, 9 Dec 2009 19:59:37 -0600
> From: Chris Fields <cjfields at illinois.edu>
> Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi
> To: Tom Keller <kellert at ohsu.edu>
> Cc: BioPerl-List <bioperl-l at bioperl.org>
> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu>
> Content-Type: text/plain; charset=us-ascii
>
> Tom,
>
> Probably best to do this via BioMart:
>
> http://www.ensembl.org/biomart/
>
> I would assume you can also do this via the ensembl perl API as well.
>
> Also, have a look at the UniProt ID Mapper:
>
> http://www.uniprot.org/?tab=mapping
>
> chris
>
> On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:
>
>> Greetings,
>> Is there a simple way to map a list of ensembl ids to the NCBI gis?
>>
>> thanks,
>> Tom
>>
>> Thomas (Tom) Keller
>> kellert at ohsu.edu
>> 503.494.2442
>> 6339b R Jones Hall (BSc/CROET)
>> www.ohsu.edu/xd/research/research-cores/dna-analysis/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>

====================================================================
                      Pablo Marin-Garcia, PhD

                     \\//          (Argiope bruennichi
                \/\/`(||>O:'\/\/   with stabilimentum)
                     //\\

Sanger Institute                |  PostDoc / Computer Biologist
Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH     |  room : N333
United Kingdom                  |  email: pablo.marin at sanger.ac.uk
====================================================================


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From umjsm at leeds.ac.uk  Fri Dec 11 11:44:42 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Fri, 11 Dec 2009 16:44:42 +0000
Subject: [Bioperl-l] extract and write a pdb chain
Message-ID: <1260549882.6484.11.camel@limm-pc1254>

Hello,

I am trying to do a very easy think but I don't get it. I want to write
in a file a chain of a pdb. I have try a lot of thinks but what I think
that it should work is the next script:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;

my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $chain ($struc->get_chains) {
	if($chain->id eq "A"){
		$new_entry->chain($chain);
		last;
	}
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');#
$out->write_structure($new_entry);

it doesn't. I get the next error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: add_chain: first argument needs to be a Model object ()

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335
STACK:
Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304
STACK: read_pdb.pl:10
-----------------------------------------------------------

As far I understand the documentation, the method chain of the object
Bio::Structure::Entry requires an as input an object of type Chain.

Any solution will be very welcome.

best regards,
Joan


From wkretzsch at gmail.com  Fri Dec 11 14:22:31 2009
From: wkretzsch at gmail.com (Warren W. Kretzschmar)
Date: Fri, 11 Dec 2009 14:22:31 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files
	generated by Hudson's ms
Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>

Hi,
I'm new to the bioperl community.  I've created a perl module that
reads in msOUT files generated by Hudson's ms.  As far as I
understand, there is no SeqIO module to read and output these files?
If so, I propose to create a module that does this.  Any suggestions?

Thanks,
Warren Kretzschmar

From maj at fortinbras.us  Fri Dec 11 14:59:53 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 11 Dec 2009 14:59:53 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT
	filesgenerated by Hudson's ms
In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife>

Hi Warren,
I say go for it. You'll want to have a look at
http://bio.perl.org/wiki/Advanced_BioPerl
which explains most of our tips and "policies" for prospective
code contributors, as well as
http://bio.perl.org/wiki/HOWTO:SeqIO
which details SeqIO from the user's perspective. Look
carefully at some Bio::SeqIO::* modules for implementation
details. If you have code to propose, use
http://bugzilla.bioperl.org
and enter a new enhancement, where you can upload
your module for us to review.
MAJ
----- Original Message ----- 
From: "Warren W. Kretzschmar" <wkretzsch at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 11, 2009 2:22 PM
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by 
Hudson's ms


> Hi,
> I'm new to the bioperl community.  I've created a perl module that
> reads in msOUT files generated by Hudson's ms.  As far as I
> understand, there is no SeqIO module to read and output these files?
> If so, I propose to create a module that does this.  Any suggestions?
>
> Thanks,
> Warren Kretzschmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bosborne11 at verizon.net  Fri Dec 11 15:37:45 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 11 Dec 2009 15:37:45 -0500
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260549882.6484.11.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
Message-ID: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>

Joan,

It looks to me like the first argument to the add_chain() method has  
to be a Model object, the second is the Chain itself. See Structure/ 
Entry.pm, for example. However if you're seeing some documentation  
that says something else then tell us where, it needs to be corrected.

In Bio::Structure an Entry consists of one or Models, each of which  
has one or more Chains. This allows you to build macromolecular  
complexes (an Entry), which could have more than one defined proteins  
or protein complexes (Models).

Brian O.

On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:

> Hello,
>
> I am trying to do a very easy think but I don't get it. I want to  
> write
> in a file a chain of a pdb. I have try a lot of thinks but what I  
> think
> that it should work is the next script:
>
> use Bio::Structure::IO;
> use strict;
>
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> =>
> 'pdb');
> my $struc = $structio->next_structure;
>
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
>
> for my $chain ($struc->get_chains) {
> 	if($chain->id eq "A"){
> 		$new_entry->chain($chain);
> 		last;
> 	}
> }
>
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');#
> $out->write_structure($new_entry);
>
> it doesn't. I get the next error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: add_chain: first argument needs to be a Model object ()
>
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> 368
> STACK:
> Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:335
> STACK:
> Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:391
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:304
> STACK: read_pdb.pl:10
> -----------------------------------------------------------
>
> As far I understand the documentation, the method chain of the object
> Bio::Structure::Entry requires an as input an object of type Chain.
>
> Any solution will be very welcome.
>
> best regards,
> Joan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Sun Dec 13 16:48:13 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sun, 13 Dec 2009 21:48:13 +0000
Subject: [Bioperl-l] combining tree image with heatmap
Message-ID: <4B25611D.6050009@sgul.ac.uk>

I am trying to draw a tree on the side of a heatmap image, much like you
see after clustering data.

I was wondering if anyone has managed to do this using bioperl? I can
draw the two separately, but can't quite seem to work out how to put the
two together and get the nodes to line up with the correct row of
clustering data.

Is there any particular module to look at?

thanks for any help

adam

From dhwani1030 at gmail.com  Sat Dec 12 15:04:01 2009
From: dhwani1030 at gmail.com (dhwani gandhi)
Date: Sat, 12 Dec 2009 15:04:01 -0500
Subject: [Bioperl-l] Bioperl code help
Message-ID: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>

Hi,
I am very new to Bioperl but I am somewhat familiar to perl though.

I write my perl programs in Notepad++ and run them in cmd.

Now, I want to run Bioperl programs. I just installed bioperl on my
computer. And I have a program using bioperl modules in Notepad++.

My question is how to run these programs? Can they be ran in cmd as well? or
do I use ppm?

Please help.

Thanks,
-Dhwani Gandhi.

From eric_donaldson at med.unc.edu  Sun Dec 13 18:15:24 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Sun, 13 Dec 2009 18:15:24 -0500
Subject: [Bioperl-l] problem with install
Message-ID: <f77787b07d66b.4b252f3c@med.unc.edu>

Hello,

Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the 

fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10.

But now I get an error when trying to run a bioperl script.

Here is the error:

Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8.
BEGIN failed--compilation aborted at blastparser.pl line 8.


I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me?

Thank you,

Eric


Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From jason at bioperl.org  Sun Dec 13 20:24:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 17:24:26 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f77787b07d66b.4b252f3c@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>

Hi Eric -

Bio::Tools::BPlite is no longer supported in Bioperl - it was  
deprecated several releases ago.
It was replaced with Bio::SearchIO

-jason
On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:

> Hello,
>
> Today I downloaded bioperl 1.61 on my new macbook pro using fink.  I  
> used the
>
> fink install bioperl.pm-588 as I could not get it to instal using  
> the perl version 5.10.
>
> But now I get an error when trying to run a bioperl script.
>
> Here is the error:
>
> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ 
> perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.pl line 8.
> BEGIN failed--compilation aborted at blastparser.pl line 8.
>
>
> I am a novice at unix and bioperl so I do not know how to  
> troubleshoot this, would you please hleo me?
>
> Thank you,
>
> Eric
>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Sun Dec 13 23:09:45 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 20:09:45 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f79059397d7fa.4b255f0b@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>

So you installed perl-5.10 or using system perl?  I'm confused if you  
actually installed bioperl.pm or not via fink?

It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5  
which is one of the dirs it would have installed in, but I don't think  
you actually installed bioperl.

you can try and do:
$ locate Bio/SearchIO.pm

We'll see if any of the other osx/fink gurus are on the list that can  
help or you can install it via CPAN I guess.

-jason
On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:

>
> I actually tried a different blastparser that uses BIO::SearchIO and  
> got the same message:
>
> Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ 
> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.new.pl line 8.
> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>
> I suspect there is a path problem, but am not savvy enough to know  
> how to fix it.  I am really just a hacker.... I have several scripts  
> that I use regularly and that I know how to modify, but am lost when  
> they don't work...
>
> Thanks for any help,
>
> Eric
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 8:24 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: bioperl-l at bioperl.org
>
>> Hi Eric -
>>
>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>> was
>> deprecated several releases ago.
>> It was replaced with Bio::SearchIO
>>
>> -jason
>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>
>>> Hello,
>>>
>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>> fink.  I
>>> used the
>>>
>>> fink install bioperl.pm-588 as I could not get it to instal
>> using
>>> the perl version 5.10.
>>>
>>> But now I get an error when trying to run a bioperl script.
>>>
>>> Here is the error:
>>>
>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>> /sw/lib/
>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>> /sw/lib/perl5/darwin /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>
>>>
>>> I am a novice at unix and bioperl so I do not know how
>> to
>>> troubleshoot this, would you please hleo me?
>>>
>>> Thank you,
>>>
>>> Eric
>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>>
>> < 
>> eric_donaldson.vcf>_______________________________________________>  
>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Mon Dec 14 00:10:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 21:10:54 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f7a30bbc786b3.4b258092@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>

Eric -
please CC the bioperl list when responding so others can help - I  
can't be the only answerer.

But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you  
would need to make sure that is added to your PERL5LIB.
There are some help docs on the perl sites I expect on how to get your  
PATHs in order.

Or you can just install via CPAN which will put it in the right path -  
there are docs on the bioperl website about installing via CPAN.

-jason
On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:

> Hi Jason,
>
> The fink package did not have support for perl 5.10, so I attempted  
> to install the perl 5.8.6 package.
>
> When I attempted: locate Bio/SearchIO.pm
> I got: -bash: $: command not found
>
> So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ 
> SearchIO.pm  I cannot access it.  Do I need to use the older version  
> of perl?
>
> Would it be better to install with CPAN?  If so, can you send me to  
> a page that has instructions?
>
> Thank you so much!
>
> ERic
>
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 11:10 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: BioPerl List <bioperl-l at bioperl.org>
>
>> So you installed perl-5.10 or using system perl?  I'm
>> confused if you
>> actually installed bioperl.pm or not via fink?
>>
>> It seems like since your @INC or $PERL5LIB points to
>> /sw/lib/perl5
>> which is one of the dirs it would have installed in, but I don't
>> think
>> you actually installed bioperl.
>>
>> you can try and do:
>> $ locate Bio/SearchIO.pm
>>
>> We'll see if any of the other osx/fink gurus are on the list
>> that can
>> help or you can install it via CPAN I guess.
>>
>> -jason
>> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
>>
>>>
>>> I actually tried a different blastparser that uses
>> BIO::SearchIO and
>>> got the same message:
>>>
>>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
>> /sw/lib/perl5/
>>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
>> /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.new.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>>>
>>> I suspect there is a path problem, but am not savvy enough to
>> know
>>> how to fix it.  I am really just a hacker.... I have
>> several scripts
>>> that I use regularly and that I know how to modify, but am
>> lost when
>>> they don't work...
>>>
>>> Thanks for any help,
>>>
>>> Eric
>>>
>>> ----- Original Message -----
>>> From: Jason Stajich <jason at bioperl.org>
>>> Date: Sunday, December 13, 2009 8:24 pm
>>> Subject: Re: [Bioperl-l] problem with install
>>> To: eric_donaldson at med.unc.edu
>>> Cc: bioperl-l at bioperl.org
>>>
>>>> Hi Eric -
>>>>
>>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>>>> was
>>>> deprecated several releases ago.
>>>> It was replaced with Bio::SearchIO
>>>>
>>>> -jason
>>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>>>> fink.  I
>>>>> used the
>>>>>
>>>>> fink install bioperl.pm-588 as I could not get it to instal
>>>> using
>>>>> the perl version 5.10.
>>>>>
>>>>> But now I get an error when trying to run a bioperl script.
>>>>>
>>>>> Here is the error:
>>>>>
>>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>>>> /sw/lib/
>>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>>>> /sw/lib/perl5/darwin /
>>>>> Library/Perl/Updates/5.10.0
>> /System/Library/Perl/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/5.10.0
>>>> /Library/Perl/5.10.0/
>>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>>>> /Network/Library/
>>>>> Perl/5.10.0/darwin-thread-multi-2level
>>>> /Network/Library/Perl/5.10.0 /
>>>>> Network/Library/Perl
>> /System/Library/Perl/Extras/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>>>> at
>>>>> blastparser.pl line 8.
>>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>>>
>>>>>
>>>>> I am a novice at unix and bioperl so I do not know how
>>>> to
>>>>> troubleshoot this, would you please hleo me?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>> Eric F. Donaldson, Ph.D.
>>>>> Research Assistant Professor, Ralph Baric Lab
>>>>> University of North Carolina
>>>>> Department of Epidemiology
>>>>>
>>>>>
>>>>>
>>>> <
>>>>
>> eric_donaldson.vcf>_______________________________________________>
>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>>
>>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>> <eric_donaldson.vcf>
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From awitney at sgul.ac.uk  Mon Dec 14 04:36:19 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 14 Dec 2009 09:36:19 +0000
Subject: [Bioperl-l] Bioperl code help
In-Reply-To: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
References: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
Message-ID: <4B260713.3070402@sgul.ac.uk>


bioperl programs are just perl programs so you should run them in
exactly the same way as your perl prorgrams, from the command line

HTH

adam

On 12/12/2009 20:04, dhwani gandhi wrote:
> Hi,
> I am very new to Bioperl but I am somewhat familiar to perl though.
> 
> I write my perl programs in Notepad++ and run them in cmd.
> 
> Now, I want to run Bioperl programs. I just installed bioperl on my
> computer. And I have a program using bioperl modules in Notepad++.
> 
> My question is how to run these programs? Can they be ran in cmd as well? or
> do I use ppm?
> 
> Please help.
> 
> Thanks,
> -Dhwani Gandhi.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From umjsm at leeds.ac.uk  Mon Dec 14 05:39:32 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 10:39:32 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
Message-ID: <1260787172.7359.0.camel@limm-pc1254>

Hi Brian,

I am not calling the method add_chain, I am calling the method chain

http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6

and if I don't use as an argument an object of type

Bio::Structure::Chain

I get an error like this (-->depends of the argument<--)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
we want a Bio::Structure::Chain or a list of these

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
STACK: read_pdb.pl:11
-----------------------------------------------------------


And if I use a Chain object I get the error that I told you.

I have try this code:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
my $model = Bio::Structure::Model->new( -id  => '0');

for my $chain ($struc->get_chains) {
        if($chain->id eq "A"){
                $new_entry->add_chain($model,$chain);

                last;
        }
}
$new_entry->add_model($model);
my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_entry);


But I get an empty pdb

HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
stru              
REMARK
1                                                                      
TER       1          A
0                                                      
MASTER                                                                          
END  

I am trying a lot of combinations, but I can't write a single chain into
a file. I don't know what I am doing wrong.

Thanks for helping

regards,
Joan


On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> Joan,
> 
> It looks to me like the first argument to the add_chain() method has  
> to be a Model object, the second is the Chain itself. See Structure/ 
> Entry.pm, for example. However if you're seeing some documentation  
> that says something else then tell us where, it needs to be corrected.
> 
> In Bio::Structure an Entry consists of one or Models, each of which  
> has one or more Chains. This allows you to build macromolecular  
> complexes (an Entry), which could have more than one defined proteins  
> or protein complexes (Models).
> 
> Brian O.
> 
> On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> 
> > Hello,
> >
> > I am trying to do a very easy think but I don't get it. I want to  
> > write
> > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > think
> > that it should work is the next script:
> >
> > use Bio::Structure::IO;
> > use strict;
> >
> > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > =>
> > 'pdb');
> > my $struc = $structio->next_structure;
> >
> > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> >
> > for my $chain ($struc->get_chains) {
> > 	if($chain->id eq "A"){
> > 		$new_entry->chain($chain);
> > 		last;
> > 	}
> > }
> >
> > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > 'pdb');#
> > $out->write_structure($new_entry);
> >
> > it doesn't. I get the next error:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: add_chain: first argument needs to be a Model object ()
> >
> > STACK: Error::throw
> > STACK:
> > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > 368
> > STACK:
> > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:335
> > STACK:
> > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:391
> > STACK:
> > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:304
> > STACK: read_pdb.pl:10
> > -----------------------------------------------------------
> >
> > As far I understand the documentation, the method chain of the object
> > Bio::Structure::Entry requires an as input an object of type Chain.
> >
> > Any solution will be very welcome.
> >
> > best regards,
> > Joan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From fs5 at sanger.ac.uk  Mon Dec 14 07:18:17 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 14 Dec 2009 12:18:17 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi,

Maybe I'm really missing something here but I can't find how to parse a
file that is basically just the Feature Table from an EMBL file, looking
like this:

FT   CDS
join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842)
FT                   /colour=7
FT                   /product="RNA-binding protein, putative"
FT   CDS             213199..214812
FT                   /colour=7
FT                   /product="eukaryotic translation initiation factor
3
FT                   subunit 7, putative"
...[more of the same]

So the file has no header and no actual sequence and it is used simply
to annotate a chromosome in a genome assembly. I've always used GFF for
that purpose but have been given this file now.
BioSeqIO->new(-format=>"EMBL") complains about the missing header and if
I stick in a fake ID line, it warns about the missing sequence and the
fact that the features don't fit on the sequence (of length 0). 
Of course it's not difficult to write my own parser but I'm sure there
must be a BioPerl way of doing that that I have just overlooked. Thanks
for your help.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From David.Messina at sbc.su.se  Mon Dec 14 09:06:54 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Dec 2009 15:06:54 +0100
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>

Hi Frank,

You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12

Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.

It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.


Dave


PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


From eric_donaldson at med.unc.edu  Mon Dec 14 09:22:40 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Mon, 14 Dec 2009 09:22:40 -0500
Subject: [Bioperl-l] problem with install
In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
	<7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
Message-ID: <f750f0a17830d.4b2603e0@med.unc.edu>

Thank you Jason.? I appreciate the help.

Eric

----- Original Message -----
From: Jason Stajich <jason at bioperl.org>
Date: Monday, December 14, 2009 12:10 am
Subject: Re: [Bioperl-l] problem with install
To: eric_donaldson at med.unc.edu
Cc: BioPerl List <bioperl-l at bioperl.org>

> Eric -
> please CC the bioperl list when responding so others can help - 
> I? 
> can't be the only answerer.
> 
> But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ 
> you? 
> would need to make sure that is added to your PERL5LIB.
> There are some help docs on the perl sites I expect on how to 
> get your? 
> PATHs in order.
> 
> Or you can just install via CPAN which will put it in the right 
> path -? 
> there are docs on the bioperl website about installing via CPAN.
> 
> -jason
> On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:
> 
> > Hi Jason,
> >
> > The fink package did not have support for perl 5.10, so I 
> attempted? 
> > to install the perl 5.8.6 package.
> >
> > When I attempted: locate Bio/SearchIO.pm
> > I got: -bash: $: command not found
> >
> > So even though I can find SearchIO.pm in 
> sw/lib/perl5/5.8.8/Bio/ 
> > SearchIO.pm? I cannot access it.? Do I need to use 
> the older version? 
> > of perl?
> >
> > Would it be better to install with CPAN?? If so, can you 
> send me to? 
> > a page that has instructions?
> >
> > Thank you so much!
> >
> > ERic
> >
> >
> > ----- Original Message -----
> > From: Jason Stajich <jason at bioperl.org>
> > Date: Sunday, December 13, 2009 11:10 pm
> > Subject: Re: [Bioperl-l] problem with install
> > To: eric_donaldson at med.unc.edu
> > Cc: BioPerl List <bioperl-l at bioperl.org>
> >
> >> So you installed perl-5.10 or using system perl?? I'm
> >> confused if you
> >> actually installed bioperl.pm or not via fink?
> >>
> >> It seems like since your @INC or $PERL5LIB points to
> >> /sw/lib/perl5
> >> which is one of the dirs it would have installed in, but I don't
> >> think
> >> you actually installed bioperl.
> >>
> >> you can try and do:
> >> $ locate Bio/SearchIO.pm
> >>
> >> We'll see if any of the other osx/fink gurus are on the list
> >> that can
> >> help or you can install it via CPAN I guess.
> >>
> >> -jason
> >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
> >>
> >>>
> >>> I actually tried a different blastparser that uses
> >> BIO::SearchIO and
> >>> got the same message:
> >>>
> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
> >> /sw/lib/perl5/
> >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
> >> /
> >>> Library/Perl/Updates/5.10.0 
> /System/Library/Perl/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/5.10.0
> >> /Library/Perl/5.10.0/
> >>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >> /Network/Library/
> >>> Perl/5.10.0/darwin-thread-multi-2level
> >> /Network/Library/Perl/5.10.0 /
> >>> Network/Library/Perl 
> /System/Library/Perl/Extras/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >> at
> >>> blastparser.new.pl line 8.
> >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
> >>>
> >>> I suspect there is a path problem, but am not savvy enough to
> >> know
> >>> how to fix it.? I am really just a hacker.... I have
> >> several scripts
> >>> that I use regularly and that I know how to modify, but am
> >> lost when
> >>> they don't work...
> >>>
> >>> Thanks for any help,
> >>>
> >>> Eric
> >>>
> >>> ----- Original Message -----
> >>> From: Jason Stajich <jason at bioperl.org>
> >>> Date: Sunday, December 13, 2009 8:24 pm
> >>> Subject: Re: [Bioperl-l] problem with install
> >>> To: eric_donaldson at med.unc.edu
> >>> Cc: bioperl-l at bioperl.org
> >>>
> >>>> Hi Eric -
> >>>>
> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
> >>>> was
> >>>> deprecated several releases ago.
> >>>> It was replaced with Bio::SearchIO
> >>>>
> >>>> -jason
> >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
> >>>> fink.? I
> >>>>> used the
> >>>>>
> >>>>> fink install bioperl.pm-588 as I could not get it to instal
> >>>> using
> >>>>> the perl version 5.10.
> >>>>>
> >>>>> But now I get an error when trying to run a bioperl script.
> >>>>>
> >>>>> Here is the error:
> >>>>>
> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
> >>>> /sw/lib/
> >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
> >>>> /sw/lib/perl5/darwin /
> >>>>> Library/Perl/Updates/5.10.0
> >> /System/Library/Perl/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/5.10.0
> >>>> /Library/Perl/5.10.0/
> >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >>>> /Network/Library/
> >>>>> Perl/5.10.0/darwin-thread-multi-2level
> >>>> /Network/Library/Perl/5.10.0 /
> >>>>> Network/Library/Perl
> >> /System/Library/Perl/Extras/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >>>> at
> >>>>> blastparser.pl line 8.
> >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
> >>>>>
> >>>>>
> >>>>> I am a novice at unix and bioperl so I do not know how
> >>>> to
> >>>>> troubleshoot this, would you please hleo me?
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>>
> >>>>> Eric F. Donaldson, Ph.D.
> >>>>> Research Assistant Professor, Ralph Baric Lab
> >>>>> University of North Carolina
> >>>>> Department of Epidemiology
> >>>>>
> >>>>>
> >>>>>
> >>>> <
> >>>>
> >> eric_donaldson.vcf>_______________________________________________>
> >>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> jason.stajich at gmail.com
> >>>> jason at bioperl.org
> >>>>
> >>>>
> >>>
> >>> Eric F. Donaldson, Ph.D.
> >>> Research Assistant Professor, Ralph Baric Lab
> >>> University of North Carolina
> >>> Department of Epidemiology
> >>>
> >>>
> >>> <eric_donaldson.vcf>
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >>
> >>
> >
> > Eric F. Donaldson, Ph.D.
> > Research Assistant Professor, Ralph Baric Lab
> > University of North Carolina
> > Department of Epidemiology
> >
> >
> > <eric_donaldson.vcf>
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> 
> 

Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From umjsm at leeds.ac.uk  Mon Dec 14 11:58:03 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 16:58:03 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260787172.7359.0.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
	<1260787172.7359.0.camel@limm-pc1254>
Message-ID: <1260809883.7359.15.camel@limm-pc1254>

Hi again,


To extract a pdb chain in a file, I have had to do it adding atom by
atom to a new structure.

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_struct = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $model ($struc->get_models){
	$new_struct->add_model($model);
	for my $chain ($struc->get_chains) {
		$new_struct->add_chain($model,$chain);
		if($chain->id eq "A"){
			foreach my $res ($struc->get_residues($chain)){
				$new_struct->add_residue($chain,$res);
				foreach my $atom  ($struc->get_atoms($res)){
					$new_struct->add_atom($res,$atom);
				}
			}
		}
		last;
	}
	last;
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_struct);

I suppose that there should be a more elegant way to do it.

If someone knows it and can explain it I will be very grateful.

kind regards, 
Joan

On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote:
> Hi Brian,
> 
> I am not calling the method add_chain, I am calling the method chain
> 
> http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6
> 
> and if I don't use as an argument an object of type
> 
> Bio::Structure::Chain
> 
> I get an error like this (-->depends of the argument<--)
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
> we want a Bio::Structure::Chain or a list of these
> 
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
> STACK: read_pdb.pl:11
> -----------------------------------------------------------
> 
> 
> And if I use a Chain object I get the error that I told you.
> 
> I have try this code:
> 
> use Bio::Structure::IO;
> use strict;
> 
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
> 'pdb');
> my $struc = $structio->next_structure;
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> my $model = Bio::Structure::Model->new( -id  => '0');
> 
> for my $chain ($struc->get_chains) {
>         if($chain->id eq "A"){
>                 $new_entry->add_chain($model,$chain);
> 
>                 last;
>         }
> }
> $new_entry->add_model($model);
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');
> $out->write_structure($new_entry);
> 
> 
> But I get an empty pdb
> 
> HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
> stru              
> REMARK
> 1                                                                      
> TER       1          A
> 0                                                      
> MASTER                                                                          
> END  
> 
> I am trying a lot of combinations, but I can't write a single chain into
> a file. I don't know what I am doing wrong.
> 
> Thanks for helping
> 
> regards,
> Joan
> 
> 
> On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> > Joan,
> > 
> > It looks to me like the first argument to the add_chain() method has  
> > to be a Model object, the second is the Chain itself. See Structure/ 
> > Entry.pm, for example. However if you're seeing some documentation  
> > that says something else then tell us where, it needs to be corrected.
> > 
> > In Bio::Structure an Entry consists of one or Models, each of which  
> > has one or more Chains. This allows you to build macromolecular  
> > complexes (an Entry), which could have more than one defined proteins  
> > or protein complexes (Models).
> > 
> > Brian O.
> > 
> > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> > 
> > > Hello,
> > >
> > > I am trying to do a very easy think but I don't get it. I want to  
> > > write
> > > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > > think
> > > that it should work is the next script:
> > >
> > > use Bio::Structure::IO;
> > > use strict;
> > >
> > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > > =>
> > > 'pdb');
> > > my $struc = $structio->next_structure;
> > >
> > > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> > >
> > > for my $chain ($struc->get_chains) {
> > > 	if($chain->id eq "A"){
> > > 		$new_entry->chain($chain);
> > > 		last;
> > > 	}
> > > }
> > >
> > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > > 'pdb');#
> > > $out->write_structure($new_entry);
> > >
> > > it doesn't. I get the next error:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: add_chain: first argument needs to be a Model object ()
> > >
> > > STACK: Error::throw
> > > STACK:
> > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > > 368
> > > STACK:
> > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:335
> > > STACK:
> > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:391
> > > STACK:
> > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:304
> > > STACK: read_pdb.pl:10
> > > -----------------------------------------------------------
> > >
> > > As far I understand the documentation, the method chain of the object
> > > Bio::Structure::Entry requires an as input an object of type Chain.
> > >
> > > Any solution will be very welcome.
> > >
> > > best regards,
> > > Joan
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gowthaman.ramasamy at sbri.org  Mon Dec 14 14:16:32 2009
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 14 Dec 2009 11:16:32 -0800
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
Message-ID: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>


Hi All,
I have a list of GO terms. And would like to pull GO accessions for them.
I can easily do the revere of it using get_term("GO::00000051").

But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".


Thanks very much,
Gowtham


From lsbrath at gmail.com  Mon Dec 14 14:41:39 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Mon, 14 Dec 2009 14:41:39 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>

Hello,

I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
following error message:

Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
/sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
/Library/Perl/5.8.8 /Library/Perl
/Network/Library/Perl/5.8.8/darwin-thread-multi-2level
/Network/Library/Perl/5.8.8 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
at project_example.pl line 4.
BEGIN failed--compilation aborted at project_example.pl line 4.

I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
Any ideas?

MEB

From scott at scottcain.net  Mon Dec 14 14:47:05 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 14 Dec 2009 14:47:05 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com>

Hi Mgavi,

I think Jason may have already started helping, but the question is:
is SeqIO.pm anywhere in those directories?  If not, why not?  If so,
why can't the perl you are using find it?  Do you have more than one
instance of perl on your machine (fairly likely if you are using a
fink-installed BioPerl)?  When you execute your script, which perl are
you using?

Scott


On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite <lsbrath at gmail.com> wrote:
> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

From bosborne11 at verizon.net  Mon Dec 14 14:45:35 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 14 Dec 2009 14:45:35 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net>

Mgavi,

So there's a directory called /sw/lib/perl5/Bio? Or is it called  
something else?

Brian O.


On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote:

> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get  
> the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ 
> 5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error  
> message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec 14 16:42:09 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 Dec 2009 13:42:09 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <C56E1117A61A4835B8E794D34A157F5B@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>
	<C56E1117A61A4835B8E794D34A157F5B@jonas>
Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org>

you can read the man page from sean Eddy or use it exactly as I showed  
you
sreformat fasta filename > filename.new

you can also use the 1st example which is a bioperl solution.

-jason
On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote:

> Hi Jason,
> thank you very much for your answer.
> i am sorry to bother u again but i'm afraid i need some help with  
> that because i don't see how to use sreformat?
> i dont get it managed to write a script that works.
>
> thank u again :)
> jonas
>
>
> ----- Original Message ----- From: "Jason Stajich" <jason at bioperl.org>
> To: "Jonas Schaer" <Jonas_Schaer at gmx.de>
> Cc: <bioperl-l at bioperl.org>
> Sent: Tuesday, December 08, 2009 6:44 PM
> Subject: Re: [Bioperl-l] fasta format
>
>
>> you can run
>> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or
>> that is installed when you install the Bioperl scripts)
>> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o  
>> yournewfile.fa
>> # rename it back
>> $ mv yournewfile.fa yourfile.fa
>>
>> or
>> $ sreformat fasta yourfile.fa > yournewfile.fa
>> $ mv yournewfile.fa yourfile.fa
>>
>>
>> -jason
>> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:
>>
>>> Hi there,
>>> I have a little question concerning bioperl. I have
>>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read
>>> in some fasta files. first it worked fine, but now i have some
>>> fastafiles in slightly different format (not all lines have the same
>>> length!).
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Each line of the fasta entry must be the same length except the
>>> last.
>>>   Line above #49 '
>>> ..' is 28 != 101 chars.
>>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/
>>> Fasta.pm:771
>>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: 
>>> 681
>>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
>>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
>>> STACK main::readfasta blast_eval.pm:174
>>> STACK toplevel blast_eval.pm:83
>>> -------------------------------------
>>>
>>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/
>>> site/lib/Bio/
>>> DB/Fasta.pm line 1054.
>>>
>>>
>>> Is there any way to use these fasta files with diffrent length of
>>> lines with this fasta.pm module or will i have to change the format
>>> of my fasta-files(big databases...) ?
>>>
>>> Thanks in advance for any help!
>>>
>>> Regards, Jonas
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>
>
> --------------------------------------------------------------------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date:  
> 12/08/09 07:34:00
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Mon Dec 14 20:23:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 14 Dec 2009 19:23:05 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>

All,

The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:

1) Stockholm Rfam reverses start and end if the strand == -1
          
   chrY/598-1

2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end

   rice-3(+)/16598648-16600199

The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?

chris

From bernd.web at gmail.com  Tue Dec 15 03:37:44 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 15 Dec 2009 09:37:44 +0100
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
	<C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com>

Dear Gowthaman,

A non-BioPerl solution: the Ontology Lookup service at EBI. It also
provides a web service interface.

http://www.ebi.ac.uk/ontology-lookup/

citrulline metabolic process has to be selected from the pull-down
list in the interactive page. This will return the ID (GO:0000052) and
addional info:

definition	The chemical reactions and pathways involving citrulline,
N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins.
preferred name	citrulline metabolic process
exact synonym	citrulline metabolism
subset	Prokaryotic GO subset
xref_definition	ISBN:209853"Oxford Dictionary of Biochemistry and
Molecular Biology"

The webservice is described at
http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do


Regards,
Bernd


On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy
<gowthaman.ramasamy at sbri.org> wrote:
>
> Hi All,
> I have a list of GO terms. And would like to pull GO accessions for them.
> I can easily do the revere of it using get_term("GO::00000051").
>
> But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".
>
>
> Thanks very much,
> Gowtham
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From fs5 at sanger.ac.uk  Tue Dec 15 05:38:40 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 15 Dec 2009 10:38:40 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
	<0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk>

Thanks Dave,
good to know that I haven't overlooked something bleedingly obvious in
Bioperl that already does this :-)
No problem, I have already implemented a simple parser to do it, which
works fine for my files. 
Thanks
Frank


On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote:
> Hi Frank,
> 
> You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12
> 
> Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.
> 
> It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.
> 
> 
> Dave
> 
> 
> PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:
> 
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From rmb32 at cornell.edu  Tue Dec 15 10:09:43 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 15 Dec 2009 07:09:43 -0800
Subject: [Bioperl-l] AGI's fpc stuff:  Bio::Map::Physical, Bio::MapIO::fpc,
	etc
Message-ID: <4B27A6B7.6090709@cornell.edu>

Hi all,

Recently I caught an interesting thing related to making GFF files out
of FPC maps built recently using Bio::MapIO;:fpc.  All of the 
coordinates in the resulting GFF3 and the sizes of the contigs and 
clones seem to be dilated by 4x from where they should be.

This didn't happen with some earlier FPC datasets I ran through these 
modules.

I haven't gone through any of this very thoroughly, but I notice in 
Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my 
$basepair = 4096', and the routine goes on to use $basepair as a sort of 
multiplier for converting the native physical map units into basepairs 
for GFF-style output.

This makes me wonder if the newer FPC datasets coming out require a 
different $basepairs value, maybe 1024?  Are the original authors of 
these modules still around on this list?

Rob

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From tristan.lefebure at gmail.com  Tue Dec 15 12:18:26 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 15 Dec 2009 12:18:26 -0500
Subject: [Bioperl-l] ncurses and bioperl?
Message-ID: <200912151218.26357.tristan.lefebure@gmail.com>

Hello,

(Be careful: the following is a very naive question)

Something that I find myself missing is a simple way to look 
at alignments and trees on remote machines where I don't 
have access to X. Since, 
	(1) one can make wonderful terminal programs like screen 
and emacs by using ncurses, 
	(2) that alignment and tree objects are already well 
handled in bioperl, and 
	(3) that there is a CPAN Curses module; 

doing 1+2+3, may I dream of a curse/bioperl perl program to 
render alignment and trees? I suppose a plain C program 
would be much better, but well I am a biologist...

Thanks,

--Tristan

From jason at bioperl.org  Tue Dec 15 12:50:52 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 15 Dec 2009 09:50:52 -0800
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <AEFA51CB-0070-4A1F-9FE3-DA4810129398@bioperl.org>

not to say this isn't a good idea, but currently for curses I would  
use the treeviewing with retree from PHYLIP
and for short read alignments the samtools tview or Gambit (MarthLab)   
works great or something like ralee for viewing MSA alignments (though  
targeted for RNA editing)
  http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ 
  http://dx.doi.org/10.1093/bioinformatics/bth489

Just that there are prior examples so would be able to learn from them  
if you still wanted to roll your own here.

-jason
On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote:

> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From roy.chaudhuri at gmail.com  Tue Dec 15 12:47:26 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 15 Dec 2009 17:47:26 +0000
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <4B27CBAE.5000303@gmail.com>

Hi Tristan,

Not a Bioperl solution, but retree from the Phylip package displays 
trees in a terminal.

Roy.

On 15/12/2009 17:18, Tristan Lefebure wrote:
> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nml5566 at gmail.com  Tue Dec 15 16:37:30 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 15 Dec 2009 15:37:30 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>

Is the Bio::Ontology::OBOEngine module working or being currently
maintained? I tried following the documentation in the module:

* use Bio::Ontology::OBOEngine;

 my $parser = Bio::Ontology::OBOEngine->new
               ( -file => "gene_ontology.obo" );

 my $engine = $parser->parse();

*But, it throws an error when I run the file saying 'Can't locate object
method "parse" '. Does anyone have any experience getting this module
working; or, is there any alternative bioperl module to extract terms and
relationships out of sequence ontology files?

From hlapp at drycafe.net  Tue Dec 15 17:05:10 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 15 Dec 2009 17:05:10 -0500
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
Message-ID: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>

That shouldn't happen I suppose, but you're not supposed really to use  
the engine directly. Rather it will be used as a backing parser by the  
Bio::OntologyIO parser you choose. Have you tried that route and found  
it not to work?

	-hilmar

On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:

> Is the Bio::Ontology::OBOEngine module working or being currently
> maintained? I tried following the documentation in the module:
>
> * use Bio::Ontology::OBOEngine;
>
> my $parser = Bio::Ontology::OBOEngine->new
>               ( -file => "gene_ontology.obo" );
>
> my $engine = $parser->parse();
>
> *But, it throws an error when I run the file saying 'Can't locate  
> object
> method "parse" '. Does anyone have any experience getting this module
> working; or, is there any alternative bioperl module to extract  
> terms and
> relationships out of sequence ontology files?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Wed Dec 16 04:58:16 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Dec 2009 10:58:16 +0100
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <DB8FB8FF-7DCE-4718-9E17-856F09AE1F46@sbc.su.se>

I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.)

It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse.

Whichever way you go, I think

> a new method that creates this, and deprecate[s] out simple non-stranded NSE

would be great.


Dave


From maj at fortinbras.us  Wed Dec 16 07:51:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 16 Dec 2009 07:51:24 -0500
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>

I'm with Dave; option 1 is cleaner. The only problem might be the automatic 
interpretation of older output as always plus strand, but presumably these would 
have had to record the strandedness explicitly elsewhere, so they would be 
updatable. I'm definitely for making strandedness part of the spec in some way. 
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 14, 2009 8:23 PM
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes


> All,
>
> The current output for NSE format (Name/Start-End) via 
> Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have 
> seen two variations of NSE that incorporate strandedness:
>
> 1) Stockholm Rfam reverses start and end if the strand == -1
>
>   chrY/598-1
>
> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>
>   rice-3(+)/16598648-16600199
>
> The former breaks fewer things within BioPerl, but the latter seems more 
> explicit.  Any preferences?  Do we want a new method that creates this, and 
> deprecate out simple non-stranded NSE?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From tuco at pasteur.fr  Wed Dec 16 09:14:28 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 15:14:28 +0100
Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO
	(Genbank)
Message-ID: <4B28EB44.3080006@pasteur.fr>

Hi,

I've wrote a small Genbank parser few months ago before BioPerl release 
1.6.0.
I tried to use my code once again but now the output of my parser is empty.
It looks like Annotation from seqfeatures is not filled anymore.

Here is the code I used previously:

while(my $seq = $streamer->next_seq()){

     #We only want to retrieve CDS features...
     foreach my $feat (grep { $_->primary_tag() eq 'CDS' } 
$seq->get_SeqFeatures()){
         print $ofh join("#",
                         
$feat->annotation()->get_Annotations('locus_tag'),    # Acc num
                         $feat->annotation()->get_Annotations('gene')
                           ? 
$feat->annotation()->get_Annotations('gene')      # Gene name
                           : 
$feat->annotation()->get_Annotations('locus_tag'),
                         
$feat->annotation()->get_Annotations('product'),      # Description
                        ),"\n";
     }
}

$feat is a Bio::SeqFeature::Generic object

If I print Dumper($feat->annotation()) here is the output :

$VAR1 = bless( {
                  '_typemap' => bless( {
                                         '_type' => {
                                                      'comment' => 
'Bio::Annotation::Comment',
                                                      'reference' => 
'Bio::Annotation::Reference',
                                                      'dblink' => 
'Bio::Annotation::DBLink'
                                                    }
                                       }, 'Bio::Annotation::TypeManager' ),
                  '_annotation' => {}
                }, 'Bio::Annotation::Collection' );

Have some changes been made into the way annotation object is populated?

Thanks for any clue and sorry if my question look stupid

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From cjfields at illinois.edu  Wed Dec 16 10:09:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 16 Dec 2009 09:09:56 -0600
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <4B28EB44.3080006@pasteur.fr>
References: <4B28EB44.3080006@pasteur.fr>
Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>

Emmanuel,

The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):

for my $feat_object ($seq_object->get_SeqFeatures) {
    print "primary tag: ", $feat_object->primary_tag, "\n";
    for my $tag ($feat_object->get_all_tags) {
        print "  tag: ", $tag, "\n";
        for my $value ($feat_object->get_tag_values($tag)) {
            print "    value: ", $value, "\n";
        }   
    }
}

You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.

chris

On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:

> Hi,
> 
> I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0.
> I tried to use my code once again but now the output of my parser is empty.
> It looks like Annotation from seqfeatures is not filled anymore.
> 
> Here is the code I used previously:
> 
> while(my $seq = $streamer->next_seq()){
> 
>    #We only want to retrieve CDS features...
>    foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){
>        print $ofh join("#",
>                        $feat->annotation()->get_Annotations('locus_tag'),    # Acc num
>                        $feat->annotation()->get_Annotations('gene')
>                          ? $feat->annotation()->get_Annotations('gene')      # Gene name
>                          : $feat->annotation()->get_Annotations('locus_tag'),
>                        $feat->annotation()->get_Annotations('product'),      # Description
>                       ),"\n";
>    }
> }
> 
> $feat is a Bio::SeqFeature::Generic object
> 
> If I print Dumper($feat->annotation()) here is the output :
> 
> $VAR1 = bless( {
>                 '_typemap' => bless( {
>                                        '_type' => {
>                                                     'comment' => 'Bio::Annotation::Comment',
>                                                     'reference' => 'Bio::Annotation::Reference',
>                                                     'dblink' => 'Bio::Annotation::DBLink'
>                                                   }
>                                      }, 'Bio::Annotation::TypeManager' ),
>                 '_annotation' => {}
>               }, 'Bio::Annotation::Collection' );
> 
> Have some changes been made into the way annotation object is populated?
> 
> Thanks for any clue and sorry if my question look stupid
> 
> Regards
> 
> Emmanuel
> 
> -- 
> -------------------------
> Emmanuel Quevillon
> Biological Software and Databases Group
> Institut Pasteur
> +33 1 44 38 95 98
> tuco at_ pasteur dot fr
> -------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tuco at pasteur.fr  Wed Dec 16 10:37:45 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 16:37:45 +0100
Subject: [Bioperl-l] Data missing into Annotation object
 using	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <4B28FEC9.1080509@pasteur.fr>

On 12/16/2009 04:09 PM, Chris Fields wrote:
> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>      print "primary tag: ", $feat_object->primary_tag, "\n";
>      for my $tag ($feat_object->get_all_tags) {
>          print "  tag: ", $tag, "\n";
>          for my $value ($feat_object->get_tag_values($tag)) {
>              print "    value: ", $value, "\n";
>          }
>      }
> }
>
> You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
>    
Hi Chris

Thanks for the infos.
I indeed revert back to using $feat->get_tag_values() and it works as 
previously.
For my small problem I can keep this solution which far adapted for my 
problem.

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From sung at bio.cc  Wed Dec 16 12:55:16 2009
From: sung at bio.cc (Sungsam Gong)
Date: Wed, 16 Dec 2009 17:55:16 +0000
Subject: [Bioperl-l] pdb.pm and annotations
Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>

Hi,

Wanted to get pubmed identifier from a PDB file using Bio::Structure,
so hacked the code.
Knew that Bio::Structure::IO::pdb.pm get relevant info from either
'JRNL' or 'REMARK 1'.
However could not see any actual code parsing 'PMID'.

>From pdb.pm, what I see:

sub _read_PDB_jrnl {
...
           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
...
}

sub _read_PDB_remark_1 {
...
               $auth = $self->_concatenate_lines($auth,$rol) if
($subr eq "AUTH");
               $titl = $self->_concatenate_lines($titl,$rol) if
($subr eq "TITL");
               $edit = $self->_concatenate_lines($edit,$rol) if
($subr eq "EDIT");
               $ref  = $self->_concatenate_lines($ref ,$rol) if
($subr eq "REF");
               $publ = $self->_concatenate_lines($publ,$rol) if
($subr eq "PUBL");
               $refn = $self->_concatenate_lines($refn,$rol) if
($subr eq "REFN");
...
}

>From my script, I did:

($struc->annotation->get_Annotations('reference'))[0]->authors
($struc->annotation->get_Annotations('reference'))[0]->title

or

my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
for my $key (keys %{$hash_ref}) {
   print $key,": ",$hash_ref->{$key},"\n";
}

Any plan to include a code chopping 'PMID' out?
Or did I miss something?

Cheers,
Sung

From nml5566 at gmail.com  Wed Dec 16 14:42:57 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Wed, 16 Dec 2009 13:42:57 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
	<F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com>

Actually, yes I did find that and it works very well. Now I'm wondering, is
it possible to search for similar terms using a string instead of a
Bio::Ontology term object? For examle, I'd like to search for the synonym:
"transcription start site" and have it return all similar terms. But, it
throws an error if I pass in a simple query like that.

-Nathan

On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp <hlapp at drycafe.net> wrote:

> That shouldn't happen I suppose, but you're not supposed really to use the
> engine directly. Rather it will be used as a backing parser by the
> Bio::OntologyIO parser you choose. Have you tried that route and found it
> not to work?
>
>        -hilmar
>
>
> On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:
>
>  Is the Bio::Ontology::OBOEngine module working or being currently
>> maintained? I tried following the documentation in the module:
>>
>> * use Bio::Ontology::OBOEngine;
>>
>> my $parser = Bio::Ontology::OBOEngine->new
>>              ( -file => "gene_ontology.obo" );
>>
>> my $engine = $parser->parse();
>>
>> *But, it throws an error when I run the file saying 'Can't locate object
>> method "parse" '. Does anyone have any experience getting this module
>> working; or, is there any alternative bioperl module to extract terms and
>> relationships out of sequence ontology files?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
>

From cjfields1 at gmail.com  Wed Dec 16 19:53:50 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
Message-ID: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>

Howdy from Google Groups

From cjfields1 at gmail.com  Wed Dec 16 20:01:38 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>

I would like to announce (with the tremendous help of Hilmar Lapp) the
creation of a mirror for the BioPerl mail list, if the last post
didn't already give it away.

http://groups.google.com/group/bioperl-l

One can join the group and submit posts via the Google Groups web
interface or via email.  Have fun!

chris

From ocarnorsk138 at gmail.com  Wed Dec 16 20:12:21 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
In-Reply-To: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
References: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com>

testing back from google group!

On Dec 16, 9:53?pm, Chris Fields <cjfiel... at gmail.com> wrote:
> Howdy from Google Groups
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Thu Dec 17 05:50:23 2009
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
Message-ID: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>

On Dec 17, 1:01?am, Chris Fields <cjfiel... at gmail.com> wrote:
> I would like to announce (with the tremendous help of Hilmar Lapp) the
> creation of a mirror for the BioPerl mail list, if the last post
> didn't already give it away.
>
> http://groups.google.com/group/bioperl-l
>
> One can join the group and submit posts via the Google Groups web
> interface or via email. ?Have fun!
>
> chris

Sounds particularly good in the long run (once there is enough of
an archive on Google Groups to make searching there useful).

Does this mean a Google Groups user doesn't have to be subscribed
to the mailing list to post (since the mailing list normally only
allows subscribers to post)?

Peter


From David.Messina at sbc.su.se  Thu Dec 17 07:25:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 17 Dec 2009 13:25:49 +0100
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>

Very nice, Chris and Hilmar! That'll be great.


> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post (since the mailing list normally only
> allows subscribers to post)?


I think that's right. From the Google groups page:

> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.


Dave


From cjfields at illinois.edu  Thu Dec 17 08:21:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 07:21:46 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu>


On Dec 17, 2009, at 6:25 AM, Dave Messina wrote:

> Very nice, Chris and Hilmar! That'll be great.
> 
> 
> 
>> Does this mean a Google Groups user doesn't have to be subscribed
>> to the mailing list to post (since the mailing list normally only
>> allows subscribers to post)?
> 
> 
> I think that's right. From the Google groups page:
> 
>> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.
> 
> 
> 
> 
> Dave

It is moderated by user to deal with spam.  Hilmar's already a manager/co-owner, and either of us can add more as needed.

chris

From hlapp at drycafe.net  Thu Dec 17 09:52:33 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 17 Dec 2009 09:52:33 -0500
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>


On Dec 17, 2009, at 5:50 AM, Peter wrote:

> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post


Yes. They can post through the Google Groups web interface.

The email address for mirrored groups is the one of the list being  
mirrored though, bioperl-l at bioperl.org in this case, and so in order  
to post by email you still have to be subscribed at the bioperl-l  
list. At least that's what the docs at Google say.

I haven't tried yet posting to the group at the bioperl-l at  
googlegroups dot com email under an email address that isn't  
subscribed to bioperl-l at bioperl dot org. Maybe it actually would  
work, contrary to docs.

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From jay at jays.net  Thu Dec 17 12:05:24 2009
From: jay at jays.net (Jay Hannah)
Date: Thu, 17 Dec 2009 11:05:24 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net>

On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote:
> I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs.

In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. 

Here is the configuration set I recommend:

   http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png

Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so.

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From robert.bradbury at gmail.com  Thu Dec 17 14:42:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 14:42:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
Message-ID: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>

Just to close out the issue of bioperl forking (in particular accesses to
external databases through get_sequence) which involves individual database
sub-modules and not collecting its children.

As it turns out the code does do an explicit fork, it looks like so the
child process can read from the database while the parent process
manipulates the data as it becomes available.  Now, one could argue that a
threaded model might be better since now threads are fairly standard OS
tools in current environments.

But I couldn't find any functions which actually wait for the forked process
(presumably because they are created for "future" use).  But nor is there
any indication in the pages I've found in most of the documentation (which
is spread across the web) or Wiki that explain that "creating child
processes" is how these functions work and one *needs* to collect those
children after each use or else zombie processes will accumulate, which on
"reasonable" systems with per-user process limits will create problems for
proper program functioning.  Nor (it would appear) does the parent process
setup a SIGCHLD "catcher" which could collect the processes once they exit
(which I expect in the case of "get_sequence" would be after closing of the
socket which actually fetched the sequence from Genbank.

It can be resolved easily enough by adding a call after each use of these
functions:
   $kid = waitpid(-1, WNOHANG);
But typically, as a programmer, I should not be responsible for having to
clean up the leftovers of library calls (unless said cleanup requirements
are clearly documented).


But to a "newbie" using the functions, coming from a functional background
(C), not an OO background (which at least I would tend to view as a wart on
the otherwise robust Perl language), there are two problems
1. The lack of documentation and examples explaining how the functions work
and how they must be handled at a higher level (by executing explicit wait
system calls).
2. The lack of code in the BioPerl functions to deal with the forked
processes which they create.  Functional programmers have a perspective --
if you create it -- you have to clean it up.  It would appear that in the
transition to OO programming (or perhaps simply for expediency) that detail
was left out of both (either/and) the documentation and the code.  From this
standpoint one could view garbage collectors as being fundamentally evil --
because they gloss over the fact that programmers should know what they are
doing and when they are doing it.

So, everywhere in the documentation where there is a get_sequence call (or
anything which accesses an external database which causes a fork to occur)
there should be a modification as I have outlined above -- or else the code
should be corrected so orphaned children are always collected and not
allowed to accumulate.

From robert.bradbury at gmail.com  Thu Dec 17 15:23:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 15:23:38 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>

Oh, yes, in case it was not clear, the fork calls which fails is in
DB/WebDBSeqI.pm: line 722
     defined(my $pid = fork)
          or $self->throw("'Couldn't fork: $!");

And of course that is because Linux has reached the process limits for the
user (due to accumulated background processes which are uncollected).

And they could be resolved by simply executing a simple waitpid call for
prior orphaned children before forking [1] But such a succinct solution
would violate "functional" programming rules -- clean up what you create --
instead they would tend to fall into the OO camp -- "Oh don't worry the
garbage collector will take care of it".  Green programming is a little less
cavalier.

Robert

1. IMO, a very very real problem with programming today is that there is no
connection between programmers and the cost of their programs.  How many
programmers know the instruction cycle time of their computers, what does an
instruction cost in terms of W consumed, W wasted (heat generation),
fruitless scanning over uncollected zombie processes, etc.  It may be that
only that programmers who grew up in the era when CPU cycles were expensive
(300 ns/cycle) who know what each instruction required in terms of cycles
consider these perspectives.  Now things (cpu use, processor use, etc) tend
to be swept under the rug and it appears that that is the case with the
standard implementation of bioper.  The documentation does not clearly state
that additional sub-processes may be created and need to be collected.  You
are providing a utility that only works "this much".  And guess what -- I
happen to have run into the "this".

From cjfields at illinois.edu  Thu Dec 17 15:25:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:25:56 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <BFDD2A52-FB3D-4CC4-A5BF-C53A3DAC9C41@illinois.edu>

Robert,

I have previously outlined specifically why you are seeing the fork issue, and a possible solution.  IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast.  Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank.  Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime.

Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X.  We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution.

My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'.  What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so.  The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect).  Please keep that in mind.

chris

On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote:

> Just to close out the issue of bioperl forking (in particular accesses to
> external databases through get_sequence) which involves individual database
> sub-modules and not collecting its children.
> 
> As it turns out the code does do an explicit fork, it looks like so the
> child process can read from the database while the parent process
> manipulates the data as it becomes available.  Now, one could argue that a
> threaded model might be better since now threads are fairly standard OS
> tools in current environments.
> 
> But I couldn't find any functions which actually wait for the forked process
> (presumably because they are created for "future" use).  But nor is there
> any indication in the pages I've found in most of the documentation (which
> is spread across the web) or Wiki that explain that "creating child
> processes" is how these functions work and one *needs* to collect those
> children after each use or else zombie processes will accumulate, which on
> "reasonable" systems with per-user process limits will create problems for
> proper program functioning.  Nor (it would appear) does the parent process
> setup a SIGCHLD "catcher" which could collect the processes once they exit
> (which I expect in the case of "get_sequence" would be after closing of the
> socket which actually fetched the sequence from Genbank.
> 
> It can be resolved easily enough by adding a call after each use of these
> functions:
>   $kid = waitpid(-1, WNOHANG);
> But typically, as a programmer, I should not be responsible for having to
> clean up the leftovers of library calls (unless said cleanup requirements
> are clearly documented).
> 
> 
> But to a "newbie" using the functions, coming from a functional background
> (C), not an OO background (which at least I would tend to view as a wart on
> the otherwise robust Perl language), there are two problems
> 1. The lack of documentation and examples explaining how the functions work
> and how they must be handled at a higher level (by executing explicit wait
> system calls).
> 2. The lack of code in the BioPerl functions to deal with the forked
> processes which they create.  Functional programmers have a perspective --
> if you create it -- you have to clean it up.  It would appear that in the
> transition to OO programming (or perhaps simply for expediency) that detail
> was left out of both (either/and) the documentation and the code.  From this
> standpoint one could view garbage collectors as being fundamentally evil --
> because they gloss over the fact that programmers should know what they are
> doing and when they are doing it.
> 
> So, everywhere in the documentation where there is a get_sequence call (or
> anything which accesses an external database which causes a fork to occur)
> there should be a modification as I have outlined above -- or else the code
> should be corrected so orphaned children are always collected and not
> allowed to accumulate.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Dec 17 15:29:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:29:10 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
	<deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
Message-ID: <FF6F8AAD-FBBE-4FAD-BB88-59A779CC7131@illinois.edu>

On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote:

> Oh, yes, in case it was not clear, the fork calls which fails is in
> DB/WebDBSeqI.pm: line 722
>     defined(my $pid = fork)
>          or $self->throw("'Couldn't fork: $!");

Okay, that's a bit more helpful.

> And of course that is because Linux has reached the process limits for the
> user (due to accumulated background processes which are uncollected).

Right, but again, we need to check this in a cross-platform compatible way.

> And they could be resolved by simply executing a simple waitpid call for
> prior orphaned children before forking [1] But such a succinct solution
> would violate "functional" programming rules -- clean up what you create --
> instead they would tend to fall into the OO camp -- "Oh don't worry the
> garbage collector will take care of it".  Green programming is a little less
> cavalier.
> 
> Robert
> 
> 1. IMO, a very very real problem with programming today is that there is no
> connection between programmers and the cost of their programs.  How many
> programmers know the instruction cycle time of their computers, what does an
> instruction cost in terms of W consumed, W wasted (heat generation),
> fruitless scanning over uncollected zombie processes, etc.  It may be that
> only that programmers who grew up in the era when CPU cycles were expensive
> (300 ns/cycle) who know what each instruction required in terms of cycles
> consider these perspectives.  Now things (cpu use, processor use, etc) tend
> to be swept under the rug and it appears that that is the case with the
> standard implementation of bioper.  The documentation does not clearly state
> that additional sub-processes may be created and need to be collected.  You
> are providing a utility that only works "this much".  And guess what -- I
> happen to have run into the "this".

Um, yeah.  Okay.

chris


From robfsouza at gmail.com  Fri Dec 18 13:07:34 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Fri, 18 Dec 2009 13:07:34 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
Message-ID: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>

Hi,

I've been dealing with an apparent bug in the output of NCBI's BLAST
programs (blastall, blastpgp) which sometimes produces output like the
one below.
I think I've managed to produce a work around for Bioperl blast.pm
parser and would like to contribute it to Bioperl.
The fix is based on blast.pm from the CVS tree (downloaded some months
ago...) and is attached to this message.
Best,
Robson

PS: what happened to the bioperl-bugs mailing list? It does not seem
to be working...

>gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
? ? ? ? ? hypothetical protein [Nasonia vitripennis]
? ? ? ? ?Length = 1774

?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust.
?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)

Query: 0 ? -

Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654
? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? +
Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376

Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++
Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432

Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF
Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491

Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++
Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548

Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L
Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602

Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E
Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661

Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E
Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blast_patched.pm
Type: application/octet-stream
Size: 91820 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091218/3771d91c/attachment-0001.obj>

From cjfields at illinois.edu  Fri Dec 18 13:33:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 18 Dec 2009 12:33:44 -0600
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <DC79216C-9DD8-47AE-876F-7BBAEC6C43CB@illinois.edu>

Robson, 

Any chance you could check this against SVN?  We haven't used the CVS tree for a few years (had a number of releases along the way as well).

Not sure about bioperl-bugs, we have bugzilla still running though:

http://bugzilla.open-bio.org/

chris


On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote:

> Hi,
> 
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson
> 
> PS: what happened to the bioperl-bugs mailing list? It does not seem
> to be working...
> 
>> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
>           hypothetical protein [Nasonia vitripennis]
>          Length = 1774
> 
>  Score = 75.9 bits (185), Expect = 1e-11,   Method: Compositional matrix adjust.
>  Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)
> 
> Query: 0   -
> 
> Sbjct: 328 P                                                            328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG             654
>            P PP +   + P       KTK+      K+P  K         +
> Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA             376
> 
> Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
>           ++  N  +    W  +     +++  +   N    NN       D   +E    PT ++
> Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432
> 
> Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
>           LD K S  + + L   + +  +I + + D    ++  + +  L  + PE D+ + ++SF
> Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491
> 
> Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
>              DG   +L   +K F  +  +P  K R      +  F  ++  +EP I S+  A +++
> Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548
> 
> Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
>           +  KSLQ ++ ++++  NFLN      +   G KL+ L KL +I++    N+  MN L
> Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602
> 
> Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
>           ++  + ++   +LL   +  +  +  ++  + +L  E   L+  +K I+++++    E
> Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661
> 
> Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
>                  +Q+ +F Q A+ EM ++ +  E+L+ + + +A+FF E
> Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
> <blast_patched.pm>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Fri Dec 18 18:00:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 18 Dec 2009 23:00:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>

On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
> Hi,
>
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson

Do you have a complete example of this kind of funny output?
This problem has also been reported with blastpgp for the
Biopython parser. I'd love an example for our unit tests
(probably worth doing in BioPerl too). Could you upload a
test case here?:

http://bugzilla.open-bio.org/show_bug.cgi?id=2927

Thanks!

Peter @ Biopython

From biopython at maubp.freeserve.co.uk  Sat Dec 19 06:19:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 19 Dec 2009 11:19:53 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Thank you,

Peter

From maj at fortinbras.us  Sat Dec 19 14:52:45 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 19 Dec 2009 14:52:45 -0500
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
Message-ID: <F7E9AD08646A44D3AB29A4504A725095@NewLife>

Hi All, 

Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
is at beta in the bioperl-run trunk. It wraps all the programs of the 
NCBI's new blast+-2.2.22 suite 
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
and integrates them, allowing you to create, mask, and query 
databases from within a single factory object. See the HOWTO
http://www.bioperl.org/wiki/HOWTO:BlastPlus
for the usual usage and implementation details.

Happy coding--
MAJ 

From David.Messina at sbc.su.se  Sat Dec 19 15:34:10 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 19 Dec 2009 21:34:10 +0100
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se>

Sweet! Thanks, Mark.


Dave

From cjfields at illinois.edu  Sat Dec 19 17:44:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 16:44:46 -0600
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu>

Very nice!  We'll definitely give it a try here (along with the requisite feedback, of course).

chris

On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote:

> Hi All, 
> 
> Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
> is at beta in the bioperl-run trunk. It wraps all the programs of the 
> NCBI's new blast+-2.2.22 suite 
> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> and integrates them, allowing you to create, mask, and query 
> databases from within a single factory object. See the HOWTO
> http://www.bioperl.org/wiki/HOWTO:BlastPlus
> for the usual usage and implementation details.
> 
> Happy coding--
> MAJ 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Sat Dec 19 23:59:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 22:59:38 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
	<6723123C0ABD447190639AE1F5D1A6A7@NewLife>
Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu>

I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1).  

I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else).

chris

On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote:

> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Monday, December 14, 2009 8:23 PM
> Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
> 
> 
>> All,
>> 
>> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:
>> 
>> 1) Stockholm Rfam reverses start and end if the strand == -1
>> 
>>  chrY/598-1
>> 
>> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>> 
>>  rice-3(+)/16598648-16600199
>> 
>> The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.osimo at gmail.com  Sun Dec 20 13:19:37 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Sun, 20 Dec 2009 19:19:37 +0100
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>

Hello everyone,
I have a very particular problem: I'd like to draw in a single track
different SNPs with a glyph that allows me to see graphically their
importance.
For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
first depicted small, and the last one big, with the ones in between with
according sizes.
I'd be satisfied also with a color gradient.
What I cannot do is to set the option -height , for example, instead than in
the add_track section, in the Bio::SeqFeature::Generic->new that I use for
each of my objects.
If I set it in the add_track section, all the glyphs are then of the same
size (or color).
If, otherwise, I add a different track for each object, my picture becomes
too big.

Please, help!
Thanks
Emanuele

From ajmackey at gmail.com  Sun Dec 20 13:41:14 2009
From: ajmackey at gmail.com (Aaron Mackey)
Date: Sun, 20 Dec 2009 13:41:14 -0500
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com>

You can set the height as a callback sub, rather than a constant -- the
callback will get passed the feature about to be drawn, from which you can
calculate the "importance", and return the desired height, dynamically.

-Aaron

On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo <e.osimo at gmail.com> wrote:

> Hello everyone,
> I have a very particular problem: I'd like to draw in a single track
> different SNPs with a glyph that allows me to see graphically their
> importance.
> For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
> first depicted small, and the last one big, with the ones in between with
> according sizes.
> I'd be satisfied also with a color gradient.
> What I cannot do is to set the option -height , for example, instead than
> in
> the add_track section, in the Bio::SeqFeature::Generic->new that I use for
> each of my objects.
> If I set it in the add_track section, all the glyphs are then of the same
> size (or color).
> If, otherwise, I add a different track for each object, my picture becomes
> too big.
>
> Please, help!
> Thanks
> Emanuele
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From robfsouza at gmail.com  Sat Dec 19 06:06:16 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 19 Dec 2009 06:06:16 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
Message-ID: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>

Hi Peter,

I just upload my example. I also reported this bug to the NCBI
developers and I hope they can fix it, since it is easy to reproduce.
I just forgot to mention the blastpgp version: 2.2.18
Best,
Robson

On Fri, Dec 18, 2009 at 6:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
> <robfsouza at gmail.com> wrote:
>> Hi,
>>
>> I've been dealing with an apparent bug in the output of NCBI's BLAST
>> programs (blastall, blastpgp) which sometimes produces output like the
>> one below.
>> I think I've managed to produce a work around for Bioperl blast.pm
>> parser and would like to contribute it to Bioperl.
>> The fix is based on blast.pm from the CVS tree (downloaded some months
>> ago...) and is attached to this message.
>> Best,
>> Robson
>
> Do you have a complete example of this kind of funny output?
> This problem has also been reported with blastpgp for the
> Biopython parser. I'd love an example for our unit tests
> (probably worth doing in BioPerl too). Could you upload a
> test case here?:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2927
>
> Thanks!
>
> Peter @ Biopython
>

From biopython at maubp.freeserve.co.uk  Mon Dec 21 10:27:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 15:27:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Hi again Robson,

Having a reproducible example to investigate this issue is
incredibly helpful - thank you!

I've been looking at the output, and while I can make sense of
it "by hand", it would be very tricky to try and parse as a special
case. It really does look like a bug in BLAST to me. The alignment
includes an initial pair, a leading gap in the query (with a coordinate
of zero), plus a residue from the match sequence (with a sensible
coordinate). The alignment statistics include this (extra) pair in
the alignment length.

You said you were using blastpgp version 2.2.18, so I tried this
with the latest (final?) version of the "legacy" BLAST suite,
blastpgp 2.2.22, which I already had installed. It looks like my
copy of NR is more recent (bigger), but the same odd output
was produced:

blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000

I also tried what I think would be the equivalent command line
on the new BLAST+ suite, using psiblast 2.2.22+ like this:

psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast
-num_threads 8 -parse_deflines -num_alignments 10000

This was much faster, and seems to output sensible alignments.

I might therefore expect the NCBI so say "yes, this is a bug in
the old blastpgp tool, just use the new psiblast tool instead".
However,  fingers crossed they will do another maintenance
release of the "legacy" BLAST suite and fix this in blastpgp.

Have you had any reply from the NCBI? Admittedly it is almost
Christmas/New Year so we may not expect an answer until Jan.

Peter

From maj at fortinbras.us  Mon Dec 21 13:52:01 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 13:52:01 -0500
Subject: [Bioperl-l] test fail
Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife>

fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)

t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
#          got: '1..4'
#     expected: 'complement(5..8)'

t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
#          got: 'complement(5..8)'
#     expected: '1..4'
# Looks like you failed 2 tests of 51.

MAJ

From cjfields at illinois.edu  Mon Dec 21 14:20:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 13:20:32 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
Message-ID: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>

Saw that from the other day (LocatableSeq commit).  I'll check it out.

chris

On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:

> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
> 
> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
> #          got: '1..4'
> #     expected: 'complement(5..8)'
> 
> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
> #          got: 'complement(5..8)'
> #     expected: '1..4'
> # Looks like you failed 2 tests of 51.
> 
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Mon Dec 21 15:02:20 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 21 Dec 2009 15:02:20 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>

Hi All,

Today it was pointed out to me that the Bio::Graphics documentation
links on the BioPerl wiki are broken, no doubt because Bio::Graphics
is no longer part of bioperl-core (is that how it should be referred
to?).  Anyway, the question is: what is the right way to rectify this
problem?  Since other things may get broken out in the future, I
suppose we should get some sort of standard established.  Can a
release of Bio::Graphics be placed somewhere on the BioPerl wiki
server to be processed?

Thanks,
Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

From cjfields at illinois.edu  Mon Dec 21 15:22:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 14:22:39 -0600
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>

We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links.  Shouldn't be too hard to do.

chris

On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:

> Hi All,
> 
> Today it was pointed out to me that the Bio::Graphics documentation
> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
> is no longer part of bioperl-core (is that how it should be referred
> to?).  Anyway, the question is: what is the right way to rectify this
> problem?  Since other things may get broken out in the future, I
> suppose we should get some sort of standard established.  Can a
> release of Bio::Graphics be placed somewhere on the BioPerl wiki
> server to be processed?
> 
> Thanks,
> Scott
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Dec 21 16:12:45 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 15:12:45 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
	<E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
Message-ID: <A396F39A-76BC-44B4-8302-4C622257E6ED@illinois.edu>

T'was a bad test call.  I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly.

chris

On Dec 21, 2009, at 1:20 PM, Chris Fields wrote:

> Saw that from the other day (LocatableSeq commit).  I'll check it out.
> 
> chris
> 
> On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:
> 
>> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
>> 
>> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
>> #          got: '1..4'
>> #     expected: 'complement(5..8)'
>> 
>> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
>> #          got: 'complement(5..8)'
>> #     expected: '1..4'
>> # Looks like you failed 2 tests of 51.
>> 
>> MAJ
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Dec 21 16:27:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:27:25 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife>

I've modified Template:Doclink ; if you now do

{{Doclink|Bio::Graphics|cpan}}

you'll get a page with only the cpan link.

{{Doclink|Bio::SeqIO}}

etc. works as usual.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Dec 21 16:34:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:34:40 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife>

Also, applied the new Doclink to Bio::Graphics on wiki.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Dec 21 21:51:32 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 21:51:32 -0500
Subject: [Bioperl-l] pdb.pm and annotations
In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife>

Hi Sung--

We didn't plan it, but we added it anyway: see revision 16559 of 
bioperl-live/trunk.
You can then do
$pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed;
and even
$doi = ($struct->annotation->get_Annotations('reference'))[0]->doi;

Thanks for the heads-up!
cheers,
MAJ
----- Original Message ----- 
From: "Sungsam Gong" <sung at bio.cc>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 16, 2009 12:55 PM
Subject: [Bioperl-l] pdb.pm and annotations


> Hi,
>
> Wanted to get pubmed identifier from a PDB file using Bio::Structure,
> so hacked the code.
> Knew that Bio::Structure::IO::pdb.pm get relevant info from either
> 'JRNL' or 'REMARK 1'.
> However could not see any actual code parsing 'PMID'.
>
>>From pdb.pm, what I see:
>
> sub _read_PDB_jrnl {
> ...
>           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
>           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
>           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
>           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
>           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
>           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
> ...
> }
>
> sub _read_PDB_remark_1 {
> ...
>               $auth = $self->_concatenate_lines($auth,$rol) if
> ($subr eq "AUTH");
>               $titl = $self->_concatenate_lines($titl,$rol) if
> ($subr eq "TITL");
>               $edit = $self->_concatenate_lines($edit,$rol) if
> ($subr eq "EDIT");
>               $ref  = $self->_concatenate_lines($ref ,$rol) if
> ($subr eq "REF");
>               $publ = $self->_concatenate_lines($publ,$rol) if
> ($subr eq "PUBL");
>               $refn = $self->_concatenate_lines($refn,$rol) if
> ($subr eq "REFN");
> ...
> }
>
>>From my script, I did:
>
> ($struc->annotation->get_Annotations('reference'))[0]->authors
> ($struc->annotation->get_Annotations('reference'))[0]->title
>
> or
>
> my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
> for my $key (keys %{$hash_ref}) {
>   print $key,": ",$hash_ref->{$key},"\n";
> }
>
> Any plan to include a code chopping 'PMID' out?
> Or did I miss something?
>
> Cheers,
> Sung
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From dan.kortschak at adelaide.edu.au  Mon Dec 21 22:24:04 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 13:54:04 +1030
Subject: [Bioperl-l] call for help and comments on module
Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>

Hi,

I've been working on a Bio::Tools::Run module to handle the bowtie rapid
alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
bioperl-run tree).

I have 90% of what I want included in the module and would like some
advice from more experienced bioperlers. Feedback on approach is also
welcomed (this is my first significant wrapper, and after a long gap
from writing module, so I am rusty). The module has ended up being
significantly more complicated than I had hoped.

There are a few issues I'm having, so I apologise for the list:

     1. Informal tests run correctly (outside the t/ tree and Test
        harness), but formal Test harness tests fail for reasons I
        cannot understand. (The module is still lacking a lot of tests,
        but since things were failing in the harness I have placed them
        as a lower priority and have been working to my micro-script
        tests - yes, bad form.
     2. I am having a big problem with IPC::Run for one of the
        executables (the module can call 5 different excutables for 7
        commands), bowtie-maptool (module command 'map'). All the other
        commands tested (this excludes bowtie-maqconvert [convert
        command]) work fine, but maptool fails with an illegal seek -
        presumably due to the redirection handling? I have no idea how
        to resolve this, so help would be greatly appreciated (a small
        script that demonstrates the use that results in the failure is
        below).

There will be provision for returning a Bio::Assembly::IO object through
samtools in the finished module, but currently the
Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.

Thanks for any help.
Dan


#!/usr/bin/perl

use strict;
use warnings;

use Bio::Tools::Run::Bowtie;

# These files are in the bioperl-run t/data/ tree
my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';

my $bowtiefac = Bio::Tools::Run::Bowtie->new(
	-command             => 'single',
	-max_seed_mismatches => 2,
	-seed_length         => 28,
	-max_qual_mismatch   => 70,
	-sam_format          => 0
	);

my $align = $bowtiefac->run($rdq,$refseq); # this runs fine

my $bowtiemap = Bio::Tools::Run::Bowtie->new(
	-command             => 'map'
	);

my $map = $bowtiemap->run($align); # throws Illegal seek

print "$map\n";

open (IN,$map);
	my $lines =(my @lines)= <IN>;
	print @lines;
	print "\n\n$lines\n";
close IN;


From maj at fortinbras.us  Tue Dec 22 00:19:35 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 22 Dec 2009 00:19:35 -0500
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <F7513FBADF944B51823A5F22FFA85911@NewLife>

Hey Dan, 
It looks like if the outfile isn't specified on the commandline for
maptool, then the align is written to stdout. So, you could 
try this workaround in in Bowtie/Config.pm:

our %command_files = (
    'single'     => [qw( ind seq #out )],
    'paired'     => [qw( ind seq seq2 #out )],
    'crossbow'   => [qw( ind seq #out )],
    'build'      => [qw( ref out )],
    'inspect'    => [qw( ind >#out )],
    'convert'    => [qw( bwt out bfa )],
-    'map'        => [qw( bwt #out )]
+    'map'        => [qw( bwt >#out )]
    );

which should be transparent to the user. If this works, then
there is probably something funky going on with IPC::Run
+ maptool; if it doesn't, then the funkiness is prob. in my code.

I notice, however, that both bowtie-maptool and bowtie-maqconvert
have been removed from the 0.12.0-beta release 
(http://bowtie-bio.sourceforge.net/index.shtml)...

cheers MAJ

----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 10:24 PM
Subject: [Bioperl-l] call for help and comments on module


> Hi,
> 
> I've been working on a Bio::Tools::Run module to handle the bowtie rapid
> alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
> bioperl-run tree).
> 
> I have 90% of what I want included in the module and would like some
> advice from more experienced bioperlers. Feedback on approach is also
> welcomed (this is my first significant wrapper, and after a long gap
> from writing module, so I am rusty). The module has ended up being
> significantly more complicated than I had hoped.
> 
> There are a few issues I'm having, so I apologise for the list:
> 
>     1. Informal tests run correctly (outside the t/ tree and Test
>        harness), but formal Test harness tests fail for reasons I
>        cannot understand. (The module is still lacking a lot of tests,
>        but since things were failing in the harness I have placed them
>        as a lower priority and have been working to my micro-script
>        tests - yes, bad form.
>     2. I am having a big problem with IPC::Run for one of the
>        executables (the module can call 5 different excutables for 7
>        commands), bowtie-maptool (module command 'map'). All the other
>        commands tested (this excludes bowtie-maqconvert [convert
>        command]) work fine, but maptool fails with an illegal seek -
>        presumably due to the redirection handling? I have no idea how
>        to resolve this, so help would be greatly appreciated (a small
>        script that demonstrates the use that results in the failure is
>        below).
> 
> There will be provision for returning a Bio::Assembly::IO object through
> samtools in the finished module, but currently the
> Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.
> 
> Thanks for any help.
> Dan
> 
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use Bio::Tools::Run::Bowtie;
> 
> # These files are in the bioperl-run t/data/ tree
> my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
> my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';
> 
> my $bowtiefac = Bio::Tools::Run::Bowtie->new(
> -command             => 'single',
> -max_seed_mismatches => 2,
> -seed_length         => 28,
> -max_qual_mismatch   => 70,
> -sam_format          => 0
> );
> 
> my $align = $bowtiefac->run($rdq,$refseq); # this runs fine
> 
> my $bowtiemap = Bio::Tools::Run::Bowtie->new(
> -command             => 'map'
> );
> 
> my $map = $bowtiemap->run($align); # throws Illegal seek
> 
> print "$map\n";
> 
> open (IN,$map);
> my $lines =(my @lines)= <IN>;
> print @lines;
> print "\n\n$lines\n";
> close IN;
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From dan.kortschak at adelaide.edu.au  Tue Dec 22 00:51:30 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 16:21:30 +1030
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <F7513FBADF944B51823A5F22FFA85911@NewLife>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
	<F7513FBADF944B51823A5F22FFA85911@NewLife>
Message-ID: <1261461090.4411.13.camel@epistle>

Hi Mark,

maptool either outputs to stdout or a specified file - I chose to use a
specified file and run it that way, but I've tried the redirect a you
suggest, with the same failure result. I think it's a strangeness of
maptool (which may well be a reason for it being dropped - also note the
maptool output doesn't seem reasonable for the test data provided even
when run from the command line).

It's probably a result of difficult interaction between IPC::Run and
maptool. Any funkiness in your code is not likely to be a cause as I've
deeply analysed what is being passed to IPC::Run, and I've quite
extensively modified the IPC run handling method from your code to take
into account the differences between a single executable with many
commands as the base code managed from a cluster of executables each
taking a small subset of different filespecs as bowtie needs. My
funkiness will undoubtedly swamp yours.

Resolution: Will drop bowtie-maptool from module.

(Should test maqconvert - if it fails, this will be dropped also unless
someone asks otherwise).

When the module copes with 0.11.* properly I'll start thinking about
0.12.* which has colourspace handling to deal with.

cheers
Dan

On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote:
> Hey Dan, 
> It looks like if the outfile isn't specified on the commandline for
> maptool, then the align is written to stdout. So, you could 
> try this workaround in in Bowtie/Config.pm:
> 
> our %command_files = (
>     'single'     => [qw( ind seq #out )],
>     'paired'     => [qw( ind seq seq2 #out )],
>     'crossbow'   => [qw( ind seq #out )],
>     'build'      => [qw( ref out )],
>     'inspect'    => [qw( ind >#out )],
>     'convert'    => [qw( bwt out bfa )],
> -    'map'        => [qw( bwt #out )]
> +    'map'        => [qw( bwt >#out )]
>     );
> 
> which should be transparent to the user. If this works, then
> there is probably something funky going on with IPC::Run
> + maptool; if it doesn't, then the funkiness is prob. in my code.
> 
> I notice, however, that both bowtie-maptool and bowtie-maqconvert
> have been removed from the 0.12.0-beta release 
> (http://bowtie-bio.sourceforge.net/index.shtml)...
> 
> cheers MAJ


From lovebaby39 at gmail.com  Wed Dec 23 05:48:55 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Wed, 23 Dec 2009 18:48:55 +0800
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>

Dear all

I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how 
to get "P.pastoris DNA for pPIC9K expression vector".

    while (my $result_u =  $blast_report_u-> next_result ) {
        while (my $hit_u = $result_u->next_hit()){
            while (my $hsp_u = $hit_u->next_hsp()){
                    $hit_u->name;
                    $hsp_u->evalue;
                    $hsp_u->score;
            }
        }
    }

I will appreciate if you could tell me how to do it.

P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download 
link?)


The flow is BLAST result:
-------------------------------------------------------------------------------------------------------------------------------------
BLASTN 2.2.16 [Mar-25-2007]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
Query=
         (458 letters)

Database: UniVec (build 4.0)
           2416 sequences; 597,480 total letters
Searching..................................................done
                                                                             
                                        Score    E
Sequences producing significant alignments: 
(bits)     Value

gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 
26   3.1
gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 
26   3.1
gnl|uv|U13843.1:1887-9923 pBPV cloning vector 
26   3.1

>gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
          Length = 2781

 Score = 26.3 bits (13), Expect = 3.1
 Identities = 13/13 (100%)
 Strand = Plus / Plus

Query: 352  tactaccgccatt 364
            |||||||||||||
Sbjct: 2209 tactaccgccatt 2221
-------------------------------------------------------------------------------------------------------------------------------------

Reginald Hsueh 


From hrh at fmi.ch  Wed Dec 23 10:14:06 2009
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Wed, 23 Dec 2009 16:14:06 +0100
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>
Message-ID: <C757F24E.5FE2%hrh@fmi.ch>

Hi

Assuming you are using "SearchIO", try:

$hit_u->description

for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO


Regards, Hans


On 12/23/09 11:48 AM, "Hsueh" <lovebaby39 at gmail.com> wrote:

> Dear all
> 
> I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how
> to get "P.pastoris DNA for pPIC9K expression vector".
> 
>     while (my $result_u =  $blast_report_u-> next_result ) {
>         while (my $hit_u = $result_u->next_hit()){
>             while (my $hsp_u = $hit_u->next_hsp()){
>                     $hit_u->name;
>                     $hsp_u->evalue;
>                     $hsp_u->score;
>             }
>         }
>     }
> 
> I will appreciate if you could tell me how to do it.
> 
> P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download
> link?)
> 
> 
> 
> The flow is BLAST result:
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> BLASTN 2.2.16 [Mar-25-2007]
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> Query=
>          (458 letters)
> 
> Database: UniVec (build 4.0)
>            2416 sequences; 597,480 total letters
> Searching..................................................done
>                  
>                                         Score    E
> Sequences producing significant alignments:
> (bits)     Value
> 
> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve...
> 26   3.1
> gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo
> 26   3.1
> gnl|uv|U13843.1:1887-9923 pBPV cloning vector
> 26   3.1
> 
>> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
>           Length = 2781
> 
>  Score = 26.3 bits (13), Expect = 3.1
>  Identities = 13/13 (100%)
>  Strand = Plus / Plus
> 
> Query: 352  tactaccgccatt 364
>             |||||||||||||
> Sbjct: 2209 tactaccgccatt 2221
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> 
> Reginald Hsueh 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 13:36:49 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 12:36:49 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
Message-ID: <200912231236490784820@gmail.com>

Hi Everyone,

I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 

I attached my CODEML outputs here to see whether you guys have some idea. 

Many thanks ahead!
 				
Best regards,
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.1
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.1
Type: application/octet-stream
Size: 11635 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.3b
Type: application/octet-stream
Size: 11330 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.3b
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0007.obj>

From cjfields at illinois.edu  Wed Dec 23 16:19:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 23 Dec 2009 15:19:48 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231236490784820@gmail.com>
References: <200912231236490784820@gmail.com>
Message-ID: <B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>

Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.

Can you file a bioperl bug report for this?  It's the best place to keep track.

http://bugzilla.open-bio.org/

chris

On Dec 23, 2009, at 12:36 PM, pkuonline wrote:

> Hi Everyone,
> 
> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
> 
> I attached my CODEML outputs here to see whether you guys have some idea. 
> 
> Many thanks ahead!
> 				
> Best regards,
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 17:45:54 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 16:45:54 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
Message-ID: <200912231645536094087@gmail.com>

Hi Chris,

Thanks for your reply and I just submitted this bug to bugzilla. 

Have a nice holiday!
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago

>-------------------------------------------------------------
>From: Chris Fields
>Time: 2009-12-23 15:19:50
>To: pkuonline  bioperl-l
>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1

>Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.
>
>Can you file a bioperl bug report for this?  It's the best place to keep track.
>
>http://bugzilla.open-bio.org/
>
>chris
>
>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>
>> Hi Everyone,
>> 
>> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
>> 
>> I attached my CODEML outputs here to see whether you guys have some idea. 
>> 
>> Many thanks ahead!
>> 				
>> Best regards,
>> -------------------------------------------------------------
>> Yong Zhang
>> Ph.D, Research Scholar
>> Manyuan Long's Lab
>> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From David.Messina at sbc.su.se  Wed Dec 23 18:23:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 24 Dec 2009 00:23:44 +0100
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se>

Hi Yong,

Could you attach your codeml output to the bug report, too?

I'll take a look at this as soon as I can.


Dave


From maj at fortinbras.us  Thu Dec 24 00:47:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 24 Dec 2009 00:47:10 -0500
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife>

Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ
----- Original Message ----- 
From: "pkuonline" <pkuonline at gmail.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "bioperl-l" <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 23, 2009 5:45 PM
Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1


> Hi Chris,
>
> Thanks for your reply and I just submitted this bug to bugzilla.
>
> Have a nice holiday!
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago
>
>>-------------------------------------------------------------
>>From: Chris Fields
>>Time: 2009-12-23 15:19:50
>>To: pkuonline  bioperl-l
>>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
>
>>Well, not completely unexpected, but very frustrating nonetheless.  Changes to 
>>PAML output have broken in just about every PAML parser revision.  Not sure 
>>when this will be addressed unfortunately, my hope is sooner than later.
>>
>>Can you file a bioperl bug report for this?  It's the best place to keep 
>>track.
>>
>>http://bugzilla.open-bio.org/
>>
>>chris
>>
>>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>>
>>> Hi Everyone,
>>>
>>> I used the latest Bioperl build, 
>>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to 
>>> parse CODEML result. I searched the mail list and found current PAML parser 
>>> is compatible with PAML 4.3a, 
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. 
>>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser 
>>> does not work. More strangely, I tested it on the old PAML 4.1 result and 
>>> also failed.
>>>
>>> I attached my CODEML outputs here to see whether you guys have some idea.
>>>
>>> Many thanks ahead!
>>>
>>> Best regards,
>>> -------------------------------------------------------------
>>> Yong Zhang
>>> Ph.D, Research Scholar
>>> Manyuan Long's Lab
>>> University of 
>>> Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 


From bhakti.dwivedi at gmail.com  Fri Dec 25 21:46:51 2009
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Fri, 25 Dec 2009 21:46:51 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
Message-ID: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>

Hi,

Does anyone know how to retrieve the "Source" or the "Species name" given
the accession number using Bioperl.   I have these 30,000 accession numbers
for which I need to get the source organisms.  Any kind of help will be
appreciated.

Thanks

BD

From maj at fortinbras.us  Fri Dec 25 22:52:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 25 Dec 2009 22:52:10 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>

Bhakti,
The following example (using EUtilities) may serve your purpose:

use Bio::DB::EUtilities;

my (%taxa, @taxa);
my (%names, %idmap);

# these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
# (probably)

my @ids = qw(1621261 89318838 68536103 20807972 730439);

my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
                                       -db => 'taxonomy',
                                       -dbfrom => 'protein',
                                       -correspondence => 1,
                                       -id => \@ids);

# iterate through the LinkSet objects
while (my $ds = $factory->next_LinkSet) {
    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
}

@taxa = @taxa{@ids};

$factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
        -db    => 'taxonomy',
        -id    => \@taxa );

while (local $_ = $factory->next_DocSum) {
    $names{($_->get_contents_by_name('TaxId'))[0]} = 
($_->get_contents_by_name('ScientificName'))[0];
}

foreach (@ids) {
    $idmap{$_} = $names{$taxa{$_}};
}

# %idmap is
#    1621261 => 'Mycobacterium tuberculosis H37Rv'
#    20807972 => 'Thermoanaerobacter tengcongensis MB4'
#    68536103 => 'Corynebacterium jeikeium K411'
#    730439 => 'Bacillus caldolyticus'
#    89318838 => undef    (this record has been removed from the db)

1;

You probably will need to break up your 30000 into chunks
(say, 1000-3000 each), and do the above on each chunk with a

sleep 3;

or so separating the queries.
MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 25, 2009 9:46 PM
Subject: [Bioperl-l] how to retrieve organism name from accession number?


> Hi,
>
> Does anyone know how to retrieve the "Source" or the "Species name" given
> the accession number using Bioperl.   I have these 30,000 accession numbers
> for which I need to get the source organisms.  Any kind of help will be
> appreciated.
>
> Thanks
>
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Sat Dec 26 06:47:29 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 26 Dec 2009 05:47:29 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <AD7C8B9A-61D1-443C-952E-BC7C66E398B2@illinois.edu>


On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote:

> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> ...
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ

The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec.

chris

From arpm9 at charter.net  Sun Dec 27 16:42:09 2009
From: arpm9 at charter.net (arpm9)
Date: Sun, 27 Dec 2009 16:42:09 -0500
Subject: [Bioperl-l]  Should Bio::Tools::BPlite be deprecated?
In-Reply-To: 4533A8D3.90709@sendu.me.uk
Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9>

hi chris,
 I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm

From pengyu.ut at gmail.com  Tue Dec 29 11:08:09 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 10:08:09 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>

May I ask somebody who are versitile in both bioperl and biopython
comment on the pros and cons of bioperl and biopython? I'm sending
this email to both bioperl and biopython mailing lists. But I hope
that it will not result in any contention.

I assume that the functionality between bioperl or biopython is the
same, i.e., tasks can be done in bioperl can be done biopython and
vice versa, as both libraries have been out there over 10 years.
Please correct me if my understanding is not true.

Given that a task that can be done with either bioperl or biopython,
I, in particularly, want to know how long it will take to write the
code for the task in bioperl and biopython, with the same readability
requirement (see below) and the assumption that users have the same
fluency in perl and python.

python is claimed to be good for maintainability. But perl is
criticized for there-are-many-ways-for-a-given-task. Since there are
multiple ways in perl, let us assume that we always use perl in a
readable way.

From jason at bioperl.org  Tue Dec 29 11:49:20 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 08:49:20 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>

Are you asking for the purposes of choosing a toolkit for your work or  
just curious about the advantages/disadvantages of language choice?

-jason
On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:

> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
>
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
>
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From ak at ebi.ac.uk  Tue Dec 29 11:57:18 2009
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Tue, 29 Dec 2009 16:57:18 +0000
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk>

On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
> 
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
> 
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
> 
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

Assuming, as you do, that the functionality of BioPerl and BioPython is
the same:  Which of the two programming languages are you (or your team)
most proficient in?  Use that language.

Regards,
Andreas

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom

From sdavis2 at mail.nih.gov  Tue Dec 29 12:03:40 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 12:03:40 -0500
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.

The two projects have similar goals, but saying that the functionality
is the same would be an extreme oversimplification.  You will need to
define what you want to do and then check to see what the two projects
have to offer.  This will, in general, require perusing the websites
for both projects as well as the relevant documentation.

> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.

Again, you will want to define the task(s) to be accomplished and then
weigh the pros and cons of each project combined with local expertise.
 If you don't know what you want to do, then you can certainly read
some examples on the websites and see which project strikes you as a
"winner" for you.

> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

These two statements are generalizations that provide little insight
into the strengths or weaknesses of the languages.  In other words,
one can write good or bad code in both languages.

Hope that helps.

Sean

From wenzhiwang1983 at yahoo.com.cn  Tue Dec 29 13:30:02 2009
From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi)
Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST)
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com>

Dear Jason,

Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO?

Thanks.

Wenzhi Wang
   State Key Laboratory of Genetic Resources and Evolution
   Kunming Institute of Zoology, Chinese Academy of Sciences
   Kunming, Yunnan 650223 P. R. China
   Tel:  86 871 5198 993
   Fax: 86 871 5195 430
   E-mail: wenzhiwang1983 at yahoo.com.cn


      ___________________________________________________________ 
  ?????????????????????????????????? 
http://card.mail.cn.yahoo.com/


From pengyu.ut at gmail.com  Tue Dec 29 13:58:59 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 12:58:59 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com>

To choose a toolkit for my work.

On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich <jason at bioperl.org> wrote:
> Are you asking for the purposes of choosing a toolkit for your work or just
> curious about the advantages/disadvantages of language choice?
>
> -jason
> On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:
>
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>>
>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>

From pengyu.ut at gmail.com  Tue Dec 29 14:15:14 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 13:15:14 -0600
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>
> The two projects have similar goals, but saying that the functionality
> is the same would be an extreme oversimplification. ?You will need to
> define what you want to do and then check to see what the two projects
> have to offer. ?This will, in general, require perusing the websites
> for both projects as well as the relevant documentation.

According to your experience, are there some tasks that are easier
with one than with another?

>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>
> Again, you will want to define the task(s) to be accomplished and then
> weigh the pros and cons of each project combined with local expertise.
> ?If you don't know what you want to do, then you can certainly read
> some examples on the websites and see which project strikes you as a
> "winner" for you.
>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>
> These two statements are generalizations that provide little insight
> into the strengths or weaknesses of the languages. ?In other words,
> one can write good or bad code in both languages.
>
> Hope that helps.
>
> Sean
>


From alperyilmaz at gmail.com  Tue Dec 29 14:36:03 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Tue, 29 Dec 2009 14:36:03 -0500
Subject: [Bioperl-l] Bio::TreeIO,
	Bio::Tree::Draw::Cladogram and phyloxml issues..
Message-ID: <dac81b0d0912291136x53edf2cjc6728e7062bd3bc1@mail.gmail.com>

Hello,

I have a tree in phyloxml format, and am trying to draw a subtree by
using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for
drawing and encountered some problems.

When I use whole tree and draw it, everything is fine; but, when I
pick a particular node and construct the subtree from that node's
ancestor by using "my $subtree = Bio::Tree::Tree->new(-root =>
$new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a
faulty EPS file, which contains extra lines added in the middle of the
file.
For instance:
.
.
.
72.0820393261372 126 moveto
(OsIBCD006509) show
30 81.25 moveto
 81.25 lineto
  lineto
48.5410196630686 120 moveto
30 120 lineto
.
.
.

Should read:

72.0820393261372 126 moveto
(OsIBCD006509) show
48.5410196630686 120 moveto
30 120 lineto


Also, I tried to write the subtree into a new phyloxml file first,
then draw it. The code is shown as follows:
my $savefile = "save.phyloxml";
my $treeout = Bio::TreeIO->new(-format =>'phyloxml',
                               -file => ">$savefile");
$treeout->write_tree($subtree);
my $tree2 = Bio::TreeIO->new(-format =>'phyloxml',
                                                 -file => "save.phyloxml");
my $t1 = $tree2->next_tree;
my $image_output = "test.eps";
my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree   => $t1,
                                                                  -top    => 10,
                                                                -bottom => 10,);
$obj1->print(-file => $image_output);

The generated phyloxml file, which is named save.phyloxml, has an
additional new line between "</phylogeny>" and "</phyloxml>" at the
end of the file. And this additional new line lead an error when doing
the parsing(open file and draw eps). I removed the new line, manually,
then Bio::Tree::Draw::Cladogram gave me the eps file successfully.

Anyone knows how to fix these problems:
1- faulty eps file generation
2- additional newline character in phyloxml output

Is it the problem about the way I create the subtree?

The phyloxml file I used can be downloaded from:
http://grassius.org/download/HSF.phyloxml

Run this code with the phyloxml file to see newline character problem:
http://pastebin.com/f87ee1ee

Run this code with the phyloxml file to see faulty eps file problem:
http://pastebin.com/fc4715a1

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954

From pengyu.ut at gmail.com  Tue Dec 29 16:32:17 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 15:32:17 -0600
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>

http://bioperl.org/Core/Latest/modules.html

Many links if not all are broken on the above pages. Could somebody fix it?

For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
I see the following error.

There is currently no text in this page. You can search for this page
title in other pages, search the related logs, or edit this page.

From jason at bioperl.org  Tue Dec 29 16:49:00 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:49:00 -0800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>

That is an outdated URL I am not sure where you are linking it from.  
We can probably now disable all old '/Core' URLs.

All documentation links are in the /wiki/

The beginner's howto is here for example
  http://bioperl.org/wiki/HOWTO:Beginners

> http://www.bioperl.org/wiki/HOWTOs


On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:

> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody  
> fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 16:50:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:50:26 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
References: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
Message-ID: <AA645194-F78E-4484-8952-02C40C1270F4@bioperl.org>

yep - be great if someone were to write it.  This being a volunteer  
project we welcome your contribution.  No I don't specifically have  
plans to do it, but maybe you can give it a try or another population  
genetics interested bioperl user/developer?

-jason
On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote:

> Dear Jason,
>
> Plink is a very useful program in the population genetics,  
> especially in the Genome-Wide SNP scan era. Is there any plan to add  
> the Plink (ped or tped) format to Bio::PopGen::IO?
>
> Thanks.
>
> Wenzhi Wang
>   State Key Laboratory of Genetic Resources and Evolution
>   Kunming Institute of Zoology, Chinese Academy of Sciences
>   Kunming, Yunnan 650223 P. R. China
>   Tel:  86 871 5198 993
>   Fax: 86 871 5195 430
>   E-mail: wenzhiwang1983 at yahoo.com.cn
>
>
>      ___________________________________________________________
>  ?????????????????
> http://card.mail.cn.yahoo.com/

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 16:57:49 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:57:49 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org>


On Dec 29, 2009, at 11:15 AM, Peng Yu wrote:

> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>  
> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com>  
>> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the  
>> functionality
>> is the same would be an extreme oversimplification.  You will need to
>> define what you want to do and then check to see what the two  
>> projects
>> have to offer.  This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?

As you have still failed to give much insight into the 'tasks' it is  
hard to give you a better answer.

If there is a module or set of routines already written then yes one  
might be easier than the other. Otherwise it just depends on your  
strengths in the programming language.
We discussed the strengths of the different toolkits briefly on the  
podcast last month.  http://twit.tv/floss96

I echo Sean. Use whichever language you are a better programmer in.   
BioPerl is more mature in some facets than is BioPython, but BioPython  
has some components that are more heavily developed and supported than  
BioPerl (structures being one of those and interfacing that to pyMol  
would be a strength).   I personally think the Gbrowse, Bio-Graphics,  
and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence  
databases and Features is a critical aspect of mining  genomic data  
and features and use these heavily in my work, making BioPerl easy and  
powerful for my tasks. That and sequence and alignment parsing and  
reformatting.  But there are comparable tools written in python with  
and without BioPython that you can also use so mainly it is about  
building up an expertise in a toolkit and going forward.  The BioPerl  
faithful will probably say it is more useful toolkit to us, but we are  
of course a biased sample.

Both projects can benefit from more users and developers contributing  
code and documentation so I would just jump in and give it a try if  
you are unsure which will be easier for you.

>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same  
>>> readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and  
>> then
>> weigh the pros and cons of each project combined with local  
>> expertise.
>>  If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages.  In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From pengyu.ut at gmail.com  Tue Dec 29 17:01:05 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:01:05 +1800
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and CDS boundary for 	a RefSeq ID?
Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>

I see the following example. But it is not clear to me how to get the
exon sequences. I also want to get the exon boundaries and associated
CDS boundaries. Although, I can get the boundary information from ucsc
table browser, but it would be convenient if I can get it in bioperl
along with the sequence.

Could somebody let me know how do it?

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html

From sdavis2 at mail.nih.gov  Tue Dec 29 17:13:30 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 17:13:30 -0500
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com>

On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.

It is unfortunate that the links are broken on that page.  However, I
believe that page is somewhat outdated, anyway.  Here are the HOWTO
pages:

http://www.bioperl.org/wiki/HOWTOs

Sean

From pengyu.ut at gmail.com  Tue Dec 29 17:21:16 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:21:16 +1800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
	<A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com>

On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich <jason at bioperl.org> wrote:
> That is an outdated URL I am not sure where you are linking it from. We can
> probably now disable all old '/Core' URLs.

I'm linked from here.

http://www.bioperl.org/wiki/BioPerl_Tutorial

Since those URLs are outdated. Could you please fix the links on the above link?

> All documentation links are in the /wiki/
>
> The beginner's howto is here for example
> ?http://bioperl.org/wiki/HOWTO:Beginners
>
>> http://www.bioperl.org/wiki/HOWTOs
>
>
> On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:
>
>> http://bioperl.org/Core/Latest/modules.html
>>
>> Many links if not all are broken on the above pages. Could somebody fix
>> it?
>>
>> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
>> I see the following error.
>>
>> There is currently no text in this page. You can search for this page
>> title in other pages, search the related logs, or edit this page.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From sdavis2 at mail.nih.gov  Tue Dec 29 18:06:17 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 18:06:17 -0500
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and 	CDS boundary for a RefSeq ID?
In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com>

On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> I see the following example. But it is not clear to me how to get the
> exon sequences. I also want to get the exon boundaries and associated
> CDS boundaries. Although, I can get the boundary information from ucsc
> table browser, but it would be convenient if I can get it in bioperl
> along with the sequence.
>
> Could somebody let me know how do it?
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html

Hi, Peng.  There may be some confusion, as the UCSC database aligns
RefSeq sequence to a genome to generate exon start and end
coordinates.  However, the RefSeq records retrieved by Bio::DB::RefSeq
are not in genomic context and so do not have start and end locations
on the genome.  That is, if you want the starts and ends along the
genome, that information is not available from the RefSeq record
itself, I don't think.  If that is what you need (genomic
coordinates), you can download the information directly from UCSC,
download flat files from NCBI mapview, or even from ensembl (using
biomart, for instance).  If you are looking for a bioperl-compliant
way of doing this, look at the Ensembl Perl API.

Sean

From jkhilmer at gmail.com  Tue Dec 29 14:55:18 2009
From: jkhilmer at gmail.com (Jonathan Hilmer)
Date: Tue, 29 Dec 2009 12:55:18 -0700
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>

Personally, I think that the differences between Python and Perl
(although substantial) are not large enough to make the language
itself the deciding factor.

Instead, consider the larger community of software.  I haven't yet
found a situation in which Python cannot be applied: it can be used
with R (statistics); lower-level code C or fortran; visualization
software such as PyMol, Chimera, Blender, VTK; plotting with
matplotlib; and scipy/numpy or sage, which provide innumerable
benefits for computation, data-processing, etc.

Although I don't claim to have a great deal of experience with Perl, I
haven't seen the same integration with that language: I'm assuming it
can be used with R and VTK (not sure about C or fortran?).  For this
reason, unless your work is highly targeted and you have no use
programming language integration with other software, I would
recommend Python.

For perl experts, I would truly appreciate any corrections you could
offer to these observations of mine, since I wouldn't mind using perl
if it offers benefits either in general or for specific applications.


Jonathan

On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the functionality
>> is the same would be an extreme oversimplification. ?You will need to
>> define what you want to do and then check to see what the two projects
>> have to offer. ?This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?
>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and then
>> weigh the pros and cons of each project combined with local expertise.
>> ?If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages. ?In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Biopython mailing list ?- ?Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From wgheath at gmail.com  Tue Dec 29 15:16:39 2009
From: wgheath at gmail.com (William Heath)
Date: Tue, 29 Dec 2009 12:16:39 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
	<81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
Message-ID: <f08ddf990912291216h32988b8cv20830c1b6701caf6@mail.gmail.com>

The biggest reason to go with python is the ease of use.  Biologists are not
programmers and the learning curve for python is much smaller than that of
perl.  I like perl but choose python because of this issue.  Perl 6 does
address some of these issues however but this has not been fully implemented
as of yet.

-Tim

P.S.

I love, love, love cpan though which is only for perl right now :(

On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer <jkhilmer at gmail.com>wrote:

> Personally, I think that the differences between Python and Perl
> (although substantial) are not large enough to make the language
> itself the deciding factor.
>
> Instead, consider the larger community of software.  I haven't yet
> found a situation in which Python cannot be applied: it can be used
> with R (statistics); lower-level code C or fortran; visualization
> software such as PyMol, Chimera, Blender, VTK; plotting with
> matplotlib; and scipy/numpy or sage, which provide innumerable
> benefits for computation, data-processing, etc.
>
> Although I don't claim to have a great deal of experience with Perl, I
> haven't seen the same integration with that language: I'm assuming it
> can be used with R and VTK (not sure about C or fortran?).  For this
> reason, unless your work is highly targeted and you have no use
> programming language integration with other software, I would
> recommend Python.
>
> For perl experts, I would truly appreciate any corrections you could
> offer to these observations of mine, since I wouldn't mind using perl
> if it offers benefits either in general or for specific applications.
>
>
> Jonathan
>
> On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> >>> May I ask somebody who are versitile in both bioperl and biopython
> >>> comment on the pros and cons of bioperl and biopython? I'm sending
> >>> this email to both bioperl and biopython mailing lists. But I hope
> >>> that it will not result in any contention.
> >>>
> >>> I assume that the functionality between bioperl or biopython is the
> >>> same, i.e., tasks can be done in bioperl can be done biopython and
> >>> vice versa, as both libraries have been out there over 10 years.
> >>> Please correct me if my understanding is not true.
> >>
> >> The two projects have similar goals, but saying that the functionality
> >> is the same would be an extreme oversimplification.  You will need to
> >> define what you want to do and then check to see what the two projects
> >> have to offer.  This will, in general, require perusing the websites
> >> for both projects as well as the relevant documentation.
> >
> > According to your experience, are there some tasks that are easier
> > with one than with another?
> >
> >>> Given that a task that can be done with either bioperl or biopython,
> >>> I, in particularly, want to know how long it will take to write the
> >>> code for the task in bioperl and biopython, with the same readability
> >>> requirement (see below) and the assumption that users have the same
> >>> fluency in perl and python.
> >>
> >> Again, you will want to define the task(s) to be accomplished and then
> >> weigh the pros and cons of each project combined with local expertise.
> >>  If you don't know what you want to do, then you can certainly read
> >> some examples on the websites and see which project strikes you as a
> >> "winner" for you.
> >>
> >>> python is claimed to be good for maintainability. But perl is
> >>> criticized for there-are-many-ways-for-a-given-task. Since there are
> >>> multiple ways in perl, let us assume that we always use perl in a
> >>> readable way.
> >>
> >> These two statements are generalizations that provide little insight
> >> into the strengths or weaknesses of the languages.  In other words,
> >> one can write good or bad code in both languages.
> >>
> >> Hope that helps.
> >>
> >> Sean
> >>
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From pengyu.ut at gmail.com  Wed Dec 30 12:26:45 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Thu, 31 Dec 2009 11:26:45 +1800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>

With Bio::SeqIO, I can only read in the records in a fasta file one by
one. This is preferable if there are many records in a file.

But I also want to read all the records in. I could use a while loop
to read all records in. But could somebody let me know if there is a
function in bioperl that can read in all the record at once and return
me an object?

http://www.bioperl.org/wiki/HOWTO:SeqIO

From sdavis2 at mail.nih.gov  Wed Dec 30 13:04:53 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 30 Dec 2009 13:04:53 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>

On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
>
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?

In perl, you can use an array to store the records.  You could also
use a hash if you have reasonable keys for the entries.

Sean


> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From jason at bioperl.org  Wed Dec 30 14:58:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 Dec 2009 11:58:54 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org>

or use a database object so you can retrieve sequences that have a  
particular id. See Bio::DB::Fasta
On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:

> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>> by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and  
>> return
>> me an object?
>
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
>
> Sean
>
>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Wed Dec 30 16:20:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 30 Dec 2009 16:20:31 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife>

I think you might want Bio::AlignIO:

$alnio = Bio::AlignIO->new(-file=> 'my.fas' );
$aln = $alnio->next_aln;
@seqs = $aln->each_seqs;

MAJ
----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 30, 2009 12:26 PM
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?


> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From David.Messina at sbc.su.se  Thu Dec 31 05:55:32 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 31 Dec 2009 11:55:32 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
Message-ID: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave


From David.Messina at sbc.su.se  Tue Dec  1 05:14:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 1 Dec 2009 11:14:40 +0100
Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem
	to be parsed
In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk>
	<50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se>
	<8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
Message-ID: <ECCDC4FE-DF46-4CF8-806F-750837DED8AA@sbc.su.se>

Hi Mick,

Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file?

In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it?

Thanks,
Dave


On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote:

> Hi Dave
> 
> Just got round to looking at this.
> 
> In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something:
> 
> --------------------- WARNING ---------------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> ---------------------------------------------------
> 
> However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace:
> 
> ------------- EXCEPTION -------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347
> STACK toplevel parse2.pl:20
> -------------------------------------
> 
> I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module.
> 
> Is this another bug report?
> 
> Thanks again for all your help
> 
> Mick
> 
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se] 
> Sent: 23 November 2009 17:46
> To: michael watson (IAH-C)
> Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed
> 
> Hi Mick,
> 
> Sure thing -- the current build from subversion is packaged up every  
> night and available here:
> http://www.bioperl.org/DIST/nightly_builds/
> 
> Just grab bioperl-live.tar.gz from there and you'll get the changes.
> 
> 
> Dave
> 
> 
> 
> 
> On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote:
> 
>> Hi Dave
>> 
>> Thanks for the hard work.
>> 
>> Trying to get the latest updates so I can use this... don't have svn  
>> on my server, tried to install it and I don't have python either,  
>> which is needed to install it.
>> 
>> I face about 3 weeks whilst my IT department sort this out, unless I  
>> can access the changes any other way?
>> 
>> Thanks
>> Mick
>> 
>> -----Original Message-----
>> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- 
>> daemon at portal.open-bio.org]
>> Sent: 20 November 2009 15:12
>> To: michael watson (IAH-C)
>> Subject: [Bug 2937] Strand in fasta35 output does not seem to be  
>> parsed
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2937
>> 
>> 
>> online at davemessina.com changed:
>> 
>>          What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>            Status|NEW                         |RESOLVED
>>        Resolution|                            |FIXED
>> 
>> 
>> 
>> 
>> ------- Comment #7 from online at davemessina.com  2009-11-20 10:12 EST  
>> -------
>> Fixed in r16394.
>> 
>> Michael, thanks for the report. Your test cases pass, but please  
>> reopen the bug
>> if needed.
>> 
>> 
>> -- 
>> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? 
>> tab=email
>> ------- You are receiving this mail because: -------
>> You reported the bug, or are watching the reporter.
> 


From e.osimo at gmail.com  Tue Dec  1 13:05:48 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Tue, 1 Dec 2009 19:05:48 +0100
Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test
Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com>

Hello everyone,
I'm trying to get the p value of a statistic made with Statistics::TTest
I cannot find this function: I can find if the null hypothesis is rejected
at a certain confidence level, but I cannot make the script show me the
actual p value.
Do you know other scripts that can do that?

Thanks
Emanuele


From cjfields at illinois.edu  Tue Dec  1 14:25:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 1 Dec 2009 13:25:03 -0600
Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov>
Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu>

I'll be adjusting the requisite parameters as indicated below.  I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it.

chris

Begin forwarded message:

> From: <utilities-announce at ncbi.nlm.nih.gov>
> Date: December 1, 2009 12:59:34 PM CST
> To: <utilities-announce at ncbi.nlm.nih.gov>
> Subject: [Utilities-announce] NCBI E-Utility Policy Change
> Reply-To: utilities-announce at ncbi.nlm.nih.gov
> 
> As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests.
>  
> The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request.
>  
> The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request.
>  
> NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities.
>  
> NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov.
>  
> Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service.
>  
> _______________________________________________
> Utilities-announce mailing list
> http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce


From maj at fortinbras.us  Tue Dec  1 21:27:06 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 21:27:06 -0500
Subject: [Bioperl-l] test test test
Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife>

MAJ


From ocarnorsk138 at gmail.com  Tue Dec  1 21:59:48 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Tue, 1 Dec 2009 23:59:48 -0300
Subject: [Bioperl-l] test test test
In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
Message-ID: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>

test test test test back


O'car Campos C.
Bioinformatics Engineering Student.
University of Talca.
Chile.


2009/12/1 Mark A. Jensen <maj at fortinbras.us>

> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Tue Dec  1 22:08:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 22:08:23 -0500
Subject: [Bioperl-l] test test test
In-Reply-To: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
	<b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
Message-ID: <CC7F9A12F9474D2BB5DC4E69190F2AE6@NewLife>

I love when people are paying attention!
  ----- Original Message ----- 
  From: Ocar Campos 
  To: Mark A. Jensen ; Bioperl Mailing List. 
  Sent: Tuesday, December 01, 2009 9:59 PM
  Subject: Re: [Bioperl-l] test test test


  test test test test back


  O'car Campos C.
  Bioinformatics Engineering Student.
  University of Talca.
  Chile.


  2009/12/1 Mark A. Jensen <maj at fortinbras.us>

    MAJ
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rtbio.2009 at gmail.com  Wed Dec  2 07:07:08 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Wed, 2 Dec 2009 13:07:08 +0100
Subject: [Bioperl-l] Remote blast
Message-ID: <c7cac1600912020407j176c83edm9f5a3d151f507bd2@mail.gmail.com>

Hello everyone,

I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a
cgi script was written which connects to NCBI blast using remote blast
program,i.e.,

The input sequence given in the html page is taken as input and Remote blast
is performed on this based on the code for Remote blast.But,I have a problem
in the Remote blast code.

My code goes like this

@compseqs=blastcode($in{'Inputseq'});

sub blastcode
{
$input1= $_[0];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
brucei[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


 while (my $input = $str->next_seq())

{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

  print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
          my $filename = $result->query_name()."\.out";
           $factory->save_output($filename);
          $factory->remove_rid($rid);
         #       open(BLASTDEBUGFILE,'>',$blastdebugfile);
  #     print BLASTDEBUGFILE "Test1  $result";
   #     close(BLASTDEBUGFILE);

     open(OUTFILE,'>',$outfile);
     print OUTFILE "Test2 $result->database_name()";
     close(OUTFILE);

    while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);

              # open(OUTFILE,'>',$outfile);
              # print OUTFILE "in while hits";
              #close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
}
# open(OUTFILE,'>',$outfile);
  #print OUTFILE $seqs[0];
 # close(OUTFILE);

return(@seqs);
}

Here in the above code,my program is able to go till the 'else' part and
writing the output file i.e.,this step.
my $filename = $result->query_name()."\.out";

But when I tried to enter in to the next while loop where I can get the
hits,the program is not entering into the while loop i.e.,

Not entering into this
while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);


Hence I am unable to get any hits for my query.
Ex:-If the query's accession number is Tb11.02.2210, I could just get a file
Tb11.02.2210.out file,it is just displaying the file name on the browser.

Please help me in solving this problem and mail me regarding any confusions.

Regards,
Roopa.


From ashvip at gmail.com  Wed Dec  2 00:24:09 2009
From: ashvip at gmail.com (Vipin Singh)
Date: Wed, 2 Dec 2009 10:54:09 +0530
Subject: [Bioperl-l] Problems with installation
Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>

Dear Sir/Madam,
I have not been able to install bioperl on my Windows 32 machine despite
repeated attempts. I have tried both Active Perl and Strwaberry perl but
both do not seem to work.
I have followed the instruction given at
-- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Please guide.
Thanks,
Vipin.
Vipin Singh,
Senior Research Fellow,
Centre for Cellular and Molecular Biology,
Hyderabad - 500007
India.
contact - 91-040-27192778


From scott at scottcain.net  Wed Dec  2 09:18:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 2 Dec 2009 09:18:37 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com>

Hello Vipin,

"do not seem to work" doesn't give us much to go on; can you tell us
what happened?

Scott


On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh <ashvip at gmail.com> wrote:
> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From maj at fortinbras.us  Wed Dec  2 09:18:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 09:18:31 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife>

Hi Vipin--
We need some more information; your commands, error messages you received.
Thanks, 
Mark
----- Original Message ----- 
From: "Vipin Singh" <ashvip at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 12:24 AM
Subject: [Bioperl-l] Problems with installation


> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From bcantarel at som.umaryland.edu  Wed Dec  2 13:36:27 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 13:36:27 -0500
Subject: [Bioperl-l] Parsing Genbank
Message-ID: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>

Hi all,
I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.

x $cds->start
1
x $cds->end
64

How can I get the original coordinates?  Is there a command for that or will I have to just do the math?

Feature or Bug?


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore


From maj at fortinbras.us  Wed Dec  2 14:09:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:09:11 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
Message-ID: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>

Hi Brandi-
If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
ordinary Bio::Seq, that's normal.
Can you elaborate by posting your code?
cheers,
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 1:36 PM
Subject: [Bioperl-l] Parsing Genbank


> Hi all,
> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
> it changes the coordinates of things on the minus strand.
>
>
> For example, I have a sequence that has a CDS on the minus strand at it is 
> from 911 to 974.  The sequence is 974 nt.
>
> x $cds->start
> 1
> x $cds->end
> 64
>
> How can I get the original coordinates?  Is there a command for that or will I 
> have to just do the math?
>
> Feature or Bug?
>
>
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bcantarel at som.umaryland.edu  Wed Dec  2 14:29:56 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 14:29:56 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>

Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
		       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
	next F1 unless ($cds->primary_tag() eq 'CDS');
	#do something with the cds start and cds end
	}
}
	 

LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
> 
> 
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>> 
>> 
>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>> 
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>> 
>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>> 
>> Feature or Bug?
>> 
>> 
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From maj at fortinbras.us  Wed Dec  2 14:48:44 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:48:44 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife>

with fake seq data and that header, I don't get a problem:

  DB<2> x $cds->location
0  Bio::Location::Simple=HASH(0x37b1df4)
   '_end' => 974
   '_location_type' => 'EXACT'
   '_root_verbose' => 0
   '_seqid' => 'subjpool12_contig3'
   '_start' => 911
   '_strand' => '-1'

Are you using the latest BioPerl (1.6.1 or the trunk) ?
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 2:29 PM
Subject: Re: [Bioperl-l] Parsing Genbank


Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
next F1 unless ($cds->primary_tag() eq 'CDS');
###>> debugger stops here for above output

#do something with the cds start and cds end
}
}


LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 
19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 
>1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
> ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" 
> <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
>
>
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
>> it changes the coordinates of things on the minus strand.
>>
>>
>> For example, I have a sequence that has a CDS on the minus strand at it is 
>> from 911 to 974.  The sequence is 974 nt.
>>
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>>
>> How can I get the original coordinates?  Is there a command for that or will 
>> I have to just do the math?
>>
>> Feature or Bug?
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 14:39:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 13:39:40 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu>

That one's odd; the coordinates should relate back to the original sequence.  Any chance you could pass on the sequence file so we can confirm it?  you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem).

chris

On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote:

> Here is some of my code, the real code actually enters the data into a database.
> 
> 
> $in  = Bio::SeqIO->new(-file => $gbkfile,
> 		       '-format' => 'genbank');
> 
> W1:while (my $seq = $in->next_seq()) {
>  my @feats = $seq->get_all_SeqFeatures();
>  my $j = 0;
> F1:foreach $cds (@feats) {
> 	next F1 unless ($cds->primary_tag() eq 'CDS');
> 	#do something with the cds start and cds end
> 	}
> }
> 	 
> 
> LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
> ACCESSION   subjpool12_contig3
> KEYWORDS    .
> SOURCE      human metagenome
>  ORGANISM  human metagenome
>            unclassified sequences; organismal metagenomes,metagenomes.
> FEATURES             Location/Qualifiers
>     source          1..974
>                     /mol_type="genomic DNA"
>                     /isolation_source="Homo sapiens"
>                     /organism="human metagenome"
>                     /collection_date="19-Nov-2009"
>     CDS             complement(911..974)
>                     /locus_tag="subjpool12_contig3|metagene|gene_2"
>                     /translation="IRIMTVELINPYIRHVEHST"
>                     /score="2.52804"
>                     /product="hypothetical protein"
>                     /note="score=2.52804"
>                     /note="score=2.52804"
>                     /note="frame=1"
> ORIGIN
> #some sequence?.
> 
> 
> 
> 
>> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
> 
> On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
>> Hi Brandi-
>> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
>> Can you elaborate by posting your code?
>> cheers,
>> MAJ
>> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, December 02, 2009 1:36 PM
>> Subject: [Bioperl-l] Parsing Genbank
>> 
>> 
>>> Hi all,
>>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>>> 
>>> 
>>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>>> 
>>> x $cds->start
>>> 1
>>> x $cds->end
>>> 64
>>> 
>>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>>> 
>>> Feature or Bug?
>>> 
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~
>>> Brandi Cantarel, PhD
>>> Bioinformatics Analyst
>>> Institute for Genome Sciences
>>> School of Medicine
>>> University of Maryland, Baltimore
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Dec  2 15:52:28 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 15:52:28 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife>

Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
as if there is a bug. If you can provide data that can reproduce
it, as Chris suggests, we can get onto it. 
thanks MAJ
  ----- Original Message ----- 
  From: Brandi Cantarel 
  To: Mark A. Jensen 
  Sent: Wednesday, December 02, 2009 3:38 PM
  Subject: Re: [Bioperl-l] Parsing Genbank


  How can I tell what version I am using?When I use the command from the website:


  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'


  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.


  Brandi


  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:


    with fake seq data and that header, I don't get a problem:

    DB<2> x $cds->location
    0  Bio::Location::Simple=HASH(0x37b1df4)
     '_end' => 974
     '_location_type' => 'EXACT'
     '_root_verbose' => 0
     '_seqid' => 'subjpool12_contig3'
     '_start' => 911
     '_strand' => '-1'

    Are you using the latest BioPerl (1.6.1 or the trunk) ?
    MAJ
    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
    Cc: <bioperl-l at lists.open-bio.org>
    Sent: Wednesday, December 02, 2009 2:29 PM
    Subject: Re: [Bioperl-l] Parsing Genbank


    Here is some of my code, the real code actually enters the data into a database.


    $in  = Bio::SeqIO->new(-file => $gbkfile,
         '-format' => 'genbank');

    W1:while (my $seq = $in->next_seq()) {
    my @feats = $seq->get_all_SeqFeatures();
    my $j = 0;
    F1:foreach $cds (@feats) {
    next F1 unless ($cds->primary_tag() eq 'CDS');
    ###>> debugger stops here for above output

    #do something with the cds start and cds end
    }
    }


    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
    ACCESSION   subjpool12_contig3
    KEYWORDS    .
    SOURCE      human metagenome
    ORGANISM  human metagenome
              unclassified sequences; organismal metagenomes,metagenomes.
    FEATURES             Location/Qualifiers
       source          1..974
                       /mol_type="genomic DNA"
                       /isolation_source="Homo sapiens"
                       /organism="human metagenome"
                       /collection_date="19-Nov-2009"
       CDS             complement(911..974)
                       /locus_tag="subjpool12_contig3|metagene|gene_2"
                       /translation="IRIMTVELINPYIRHVEHST"
                       /score="2.52804"
                       /product="hypothetical protein"
                       /note="score=2.52804"
                       /note="score=2.52804"
                       /note="frame=1"
    ORIGIN
    #some sequence?.


      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


    ~~~~~~~~~~~~~~~~~~~~
    Brandi Cantarel, PhD
    Bioinformatics Analyst
    Institute for Genome Sciences
    School of Medicine
    University of Maryland, Baltimore

    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:


      Hi Brandi-

      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.

      Can you elaborate by posting your code?

      cheers,

      MAJ

      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>

      To: <bioperl-l at lists.open-bio.org>

      Sent: Wednesday, December 02, 2009 1:36 PM

      Subject: [Bioperl-l] Parsing Genbank


        Hi all,

        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.


        x $cds->start

        1

        x $cds->end

        64


        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?


        Feature or Bug?


        ~~~~~~~~~~~~~~~~~~~~

        Brandi Cantarel, PhD

        Bioinformatics Analyst

        Institute for Genome Sciences

        School of Medicine

        University of Maryland, Baltimore


        _______________________________________________

        Bioperl-l mailing list

        Bioperl-l at lists.open-bio.org

        http://lists.open-bio.org/mailman/listinfo/bioperl-l


    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 16:07:58 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 15:07:58 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
	<07332179362A4D53ACAA9A72AD208049@NewLife>
Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu>

One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). 

Not much we can do unless we have something to help confirm the problem.  Also might help to know the source of the genbank file itself.

chris

On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote:

> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
> as if there is a bug. If you can provide data that can reproduce
> it, as Chris suggests, we can get onto it. 
> thanks MAJ
>  ----- Original Message ----- 
>  From: Brandi Cantarel 
>  To: Mark A. Jensen 
>  Sent: Wednesday, December 02, 2009 3:38 PM
>  Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>  How can I tell what version I am using?When I use the command from the website:
> 
> 
>  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 
> 
>  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.
> 
> 
>  Brandi
> 
> 
> 
> 
>  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:
> 
> 
>    with fake seq data and that header, I don't get a problem:
> 
>    DB<2> x $cds->location
>    0  Bio::Location::Simple=HASH(0x37b1df4)
>     '_end' => 974
>     '_location_type' => 'EXACT'
>     '_root_verbose' => 0
>     '_seqid' => 'subjpool12_contig3'
>     '_start' => 911
>     '_strand' => '-1'
> 
>    Are you using the latest BioPerl (1.6.1 or the trunk) ?
>    MAJ
>    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>    Cc: <bioperl-l at lists.open-bio.org>
>    Sent: Wednesday, December 02, 2009 2:29 PM
>    Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>    Here is some of my code, the real code actually enters the data into a database.
> 
> 
>    $in  = Bio::SeqIO->new(-file => $gbkfile,
>         '-format' => 'genbank');
> 
>    W1:while (my $seq = $in->next_seq()) {
>    my @feats = $seq->get_all_SeqFeatures();
>    my $j = 0;
>    F1:foreach $cds (@feats) {
>    next F1 unless ($cds->primary_tag() eq 'CDS');
>    ###>> debugger stops here for above output
> 
>    #do something with the cds start and cds end
>    }
>    }
> 
> 
>    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
>    ACCESSION   subjpool12_contig3
>    KEYWORDS    .
>    SOURCE      human metagenome
>    ORGANISM  human metagenome
>              unclassified sequences; organismal metagenomes,metagenomes.
>    FEATURES             Location/Qualifiers
>       source          1..974
>                       /mol_type="genomic DNA"
>                       /isolation_source="Homo sapiens"
>                       /organism="human metagenome"
>                       /collection_date="19-Nov-2009"
>       CDS             complement(911..974)
>                       /locus_tag="subjpool12_contig3|metagene|gene_2"
>                       /translation="IRIMTVELINPYIRHVEHST"
>                       /score="2.52804"
>                       /product="hypothetical protein"
>                       /note="score=2.52804"
>                       /note="score=2.52804"
>                       /note="frame=1"
>    ORIGIN
>    #some sequence?.
> 
> 
> 
> 
> 
>      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> 
>    ~~~~~~~~~~~~~~~~~~~~
>    Brandi Cantarel, PhD
>    Bioinformatics Analyst
>    Institute for Genome Sciences
>    School of Medicine
>    University of Maryland, Baltimore
> 
>    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
> 
>      Hi Brandi-
> 
>      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> 
>      Can you elaborate by posting your code?
> 
>      cheers,
> 
>      MAJ
> 
>      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> 
>      To: <bioperl-l at lists.open-bio.org>
> 
>      Sent: Wednesday, December 02, 2009 1:36 PM
> 
>      Subject: [Bioperl-l] Parsing Genbank
> 
> 
> 
> 
> 
>        Hi all,
> 
>        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
> 
> 
> 
> 
> 
>        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
> 
> 
> 
>        x $cds->start
> 
>        1
> 
>        x $cds->end
> 
>        64
> 
> 
> 
>        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
> 
> 
> 
>        Feature or Bug?
> 
> 
> 
> 
> 
>        ~~~~~~~~~~~~~~~~~~~~
> 
>        Brandi Cantarel, PhD
> 
>        Bioinformatics Analyst
> 
>        Institute for Genome Sciences
> 
>        School of Medicine
> 
>        University of Maryland, Baltimore
> 
> 
> 
> 
> 
>        _______________________________________________
> 
>        Bioperl-l mailing list
> 
>        Bioperl-l at lists.open-bio.org
> 
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
>    _______________________________________________
>    Bioperl-l mailing list
>    Bioperl-l at lists.open-bio.org
>    http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Thu Dec  3 05:31:31 2009
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 Dec 2009 05:31:31 -0500
Subject: [Bioperl-l] modENCODE seeking data managers
Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com>

Hi All,

My apologies for spamming the list, but this announcement may be of
interest:


The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA
Elements; www.modencode.org) is seeking data managers to gather and curate
large scale functional genomics data sets in fly and worm. For details, see
http://blog.modencode.org/?p=350.


Lincoln

-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From dan.bolser at gmail.com  Thu Dec  3 06:44:40 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 11:44:40 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ?
Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>

Hi, can someone test the script here on zero length fasta / qual files?

http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ


It seems the output has an extra newline in the sequence part of the
output (which throws off scripts that rely on the 'four lines per
record' structure of the fastq (although I'm not sure if it's illegal
fastq).

Here is what I see

BEGIN
$ head one.fna
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ head one.qual
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ createFastq.plx one.fna one.qual
@FVF7ZWH02PFOVG


+FVF7ZWH02PFOVG

END


Currently I just put in a clause in the script to skip any zero length
sequences, but I think the Qual shouldn't output an extra newline like
this.


Cheers,
Dan.


--

JHB: Bioinformatics is Biology and Biology is Bioinformatics.


From biopython at maubp.freeserve.co.uk  Thu Dec  3 07:12:15 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 12:12:15 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>

On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi, can someone test the script here on zero length fasta / qual files?
>
> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>
> It seems the output has an extra newline in the sequence part of the
> output (which throws off scripts that rely on the 'four lines per
> record' structure of the fastq (although I'm not sure if it's illegal
> fastq).

Hi Dan,

The OBF consensus was FASTQ records with a zero length
sequence might be useful, and should be output as exactly
four lines (one blank sequence line, one blank quality line).
However for parsing, any number of blank lines should be OK.
http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html

I can confirm the perl script currently outputs a FASTQ file
with TWO blank lines for the sequence, giving five lines in
total for the zero length record. That does suggest a bug.
What version of BioPerl are you running?

Peter

P.S. The script is throwing away any description after the
identifier.


From dan.bolser at gmail.com  Thu Dec  3 08:07:27 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 13:07:27 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>> Hi, can someone test the script here on zero length fasta / qual files?
>>
>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>
>> It seems the output has an extra newline in the sequence part of the
>> output (which throws off scripts that rely on the 'four lines per
>> record' structure of the fastq (although I'm not sure if it's illegal
>> fastq).
>
> Hi Dan,
>
> The OBF consensus was FASTQ records with a zero length
> sequence might be useful, and should be output as exactly
> four lines (one blank sequence line, one blank quality line).
> However for parsing, any number of blank lines should be OK.
> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>
> I can confirm the perl script currently outputs a FASTQ file
> with TWO blank lines for the sequence, giving five lines in
> total for the zero length record. That does suggest a bug.
> What version of BioPerl are you running?

Hi Peter,

Basically, I'm not running the 'latest' version of BP, which is why I
asked this question of the list rather than filing a bug report. What
version are you running? ;-)

Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
for the info).


> Peter
>
> P.S. The script is throwing away any description after the
> identifier.

That's probably bad. Feel free to edit the script on the wiki. Sadly,
MediaWiki's diff features are less than optimal, so developing scripts
on the wiki isn't ideal. Anyone know how to plug git-hub into a script
apparently hosted on a wiki?

Or is git-hub basically designed to be 'wiki for code'?

I'm wondering, because with the FlaggedRevs extension you could
basically build a whole release in the wiki. Which would be fun if
nothing else!


-- 

JHP: Biology is bioinformatics and bioinformatics is biology.


From heyne at informatik.uni-freiburg.de  Thu Dec  3 08:19:51 2009
From: heyne at informatik.uni-freiburg.de (Steffen Heyne)
Date: Thu, 03 Dec 2009 14:19:51 +0100
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>
	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de>

Hello,

so I tried to fix the problem with the location. Currently it works for
me with the following changes:

LocatableSeq.pm

sub get_nse{

...

	my $ret;
	if ($self->strand() >= 0) {
		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
	} else {
		$ret = $id . $v. $char1 . $end . $char2 . $st ;
	}
	return $ret;
}

Then I recognized during the usage of $aln->remove_seq() that it cannot
remove a seq as it uses a wrong NSE to lookup sequences. I changed the
following:

SimpleAlign.pm

sub remove_seq {

...
	$id = $seq->id();
    	$start = $seq->start();
    	$end  = $seq->end();

## changed code:

	my $v = $seq->version ? '.'.$seq->version : '';
    	if ($seq->strand >=0){
		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
	} elsif ($seq->strand == -1){
		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
	}	
...

}

The above code in LocatableSeq.pm worked in the case if I read an
alignment in stockholm format and write it out in clustalw format. But
if I read an alignment in clustalw and write it out as stockholm (or
something else) it didn't worked, as the strand is not correctly set in
ClustalW::next_aln. It works with the following changes:

ClustalW.pm

sub next_aln{

...

	my ( $sname, $start, $end, $strand );	## strand added
	$strand = 0;				## new, standard = 0???
    	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
%alignments ) {
        if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
        	( $sname, $start, $end ) = ( $1, $2, $3 );
		$strand = 1;			## new			
		if ($start > $end) {		## new
       		($start, $end, $strand) = ($end, $start, -1); ##new
		}				## new
	
      }
        else {
            ( $sname, $start ) = ( $name, 1 );
            my $str = $alignments{$name};
            $str =~ s/[^A-Za-z]//g;
            $end = length($str);
        }

        my $seq = Bio::LocatableSeq->new(
            -seq   => $alignments{$name},
            -id    => $sname,
            -start => $start,
            -end   => $end,
	    -strand=> $strand			## new
        );

...

}

So I don't know if I changed things at their correct position. And I
found them only because I used certain functions. I dont know how broad
the effect of a changed NSE in LocatableSeq.pm is to other Modules and
functions. But I'm happy with my changes (so far :-)...).

Do you will change this to your proposed way in bioperl trunk?

Thanks!

steffen


Chris Fields schrieb:
> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
> 
>> Hi,
>>
>> I'm using Bioperl for my research and it is very useful! Thank you!
>>
>> Currently I have a problem with locations tags of sequences. I read in
>> seed alignments of Rfam (in stockholm format, but I think it is
>> similar to other formats).
>>
>> If the location is like:
>>
>> AB194432.1/908-846
>>
>> the start/end values are changed to
>>
>> $seq->start = 846
>> $seq->end = 908
>>
>> and therefore the new location (e.g.$seq->get_nse) is:
>>
>> AB194432.1/846-908
>>
>> The $seq->strand tag is correctly set to -1 in this case, but if the
>> alignment is written out again (clustal, stockholm,...) this strand
>> info is lost and the sequences have this "wrong" location. But this
>> information is important in respect to the sequence accession number.
>>
>> Is there a way to set the location back to the original one or is this
>> behavior desired? Any manually setting with $seq->start($val) failed
>> due to automatic checking.
>>
>> I'm using bioperl 1.6.1
>>
>> Thanks!
>>
>> steffen
> 
> This is a definite bug. We recently discussed amending the NSE format
> due to this (the subject came up over the last few months or so); it's
> fallen through the cracks.  Fortunaely it is very easy to fix (the
> relevant method is in LocatableSeq).
> 
> Does anyone have a problem with me adding this in?  It will change
> output for only those instances where the strand is -1, so
> 
> AB194432.1/908-846
> 
> would be start = 846, end = 908, strand = -1
> 
> AB194432.1/846-908
> 
> would be start = 846, end = 908, strand = 1
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
---
Steffen Heyne, Dipl.-Bioinf.
Lehrstuhl f?r Bioinformatik
Institut f?r Informatik
Albert-Ludwigs-Universit?t Freiburg
Georges-K?hler-Allee 106
79110 Freiburg, Germany

Tel: (+49) 761 203 7465
Fax: (+49) 761 203 7462
Mail: heyne at informatik.uni-freiburg.de


From cjfields at illinois.edu  Thu Dec  3 08:47:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 07:47:32 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>

Dan,

On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

> 2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
>> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>>> Hi, can someone test the script here on zero length fasta / qual files?
>>> 
>>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>> 
>>> It seems the output has an extra newline in the sequence part of the
>>> output (which throws off scripts that rely on the 'four lines per
>>> record' structure of the fastq (although I'm not sure if it's illegal
>>> fastq).
>> 
>> Hi Dan,
>> 
>> The OBF consensus was FASTQ records with a zero length
>> sequence might be useful, and should be output as exactly
>> four lines (one blank sequence line, one blank quality line).
>> However for parsing, any number of blank lines should be OK.
>> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>> 
>> I can confirm the perl script currently outputs a FASTQ file
>> with TWO blank lines for the sequence, giving five lines in
>> total for the zero length record. That does suggest a bug.
>> What version of BioPerl are you running?
> 
> Hi Peter,
> 
> Basically, I'm not running the 'latest' version of BP, which is why I
> asked this question of the list rather than filing a bug report. What
> version are you running? ;-)
> 
> Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
> for the info).

FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN).  Basically, it now parses all three FASTQ variants.  However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1.  Peter can you confirm that?

>> Peter
>> 
>> P.S. The script is throwing away any description after the
>> identifier.
> 
> That's probably bad. Feel free to edit the script on the wiki. Sadly,
> MediaWiki's diff features are less than optimal, so developing scripts
> on the wiki isn't ideal. Anyone know how to plug git-hub into a script
> apparently hosted on a wiki?
> 
> Or is git-hub basically designed to be 'wiki for code'?

It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc.  Think Soourceforge, but a lot nicer and with no ads ;>

BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric).

> I'm wondering, because with the FlaggedRevs extension you could
> basically build a whole release in the wiki. Which would be fun if
> nothing else!

I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (

chris


From biopython at maubp.freeserve.co.uk  Thu Dec  3 09:20:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:20:32 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>

On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> FASTQ parsing had undergone a major revision prior to
> 1.6.1 (the latest release in CPAN). ?Basically, it now parses
> all three FASTQ variants. ?However, Peter indicates there
> may still be a problem, and it's likely he's running 1.6.1.
> Peter can you confirm that?

I had BioPerl from SVN circa 1.6.1 (not sure if this was before
or after the release of 1.6.1 now):

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069
$ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
1.0069

If the tuples mean anything to you:

$ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
49.46.48.48.54.57
$ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
49.46.48.48.54.57

I just updated to revision 16435, and retested. I get the same
BioPerl version numbers, and the same extra blank line in the
sequence FASTQ output as Dan reported.

Peter


From cjfields at illinois.edu  Thu Dec  3 09:39:35 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 08:39:35 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
Message-ID: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>

On Dec 3, 2009, at 8:20 AM, Peter wrote:

> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> 
>> FASTQ parsing had undergone a major revision prior to
>> 1.6.1 (the latest release in CPAN).  Basically, it now parses
>> all three FASTQ variants.  However, Peter indicates there
>> may still be a problem, and it's likely he's running 1.6.1.
>> Peter can you confirm that?
> 
> I had BioPerl from SVN circa 1.6.1 (not sure if this was before
> or after the release of 1.6.1 now):
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
> 1.0069
> 
> If the tuples mean anything to you:
> 
> $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 49.46.48.48.54.57
> $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
> 49.46.48.48.54.57
> 
> I just updated to revision 16435, and retested. I get the same
> BioPerl version numbers, and the same extra blank line in the
> sequence FASTQ output as Dan reported.
> 
> Peter

Okay I will try to look into it today (it should be an easy fix).  There are two issues, correct?

1) extra blank line.
2) missing description

Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)?  Otherwise it might get lost on the mail list or wiki.

chris


From biopython at maubp.freeserve.co.uk  Thu Dec  3 09:56:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:56:39 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>

On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?
>
> 1) extra blank line.

Which seems to be a bug in BioPerl SeqIO itself.

> 2) missing description

This is just a trivial bug/omission in the wiki example,
http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ

You just need to replace this:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

With:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -description => $seq_obj->description,
             -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

Look - I seem to be learning Perl by osmosis ;)

Peter


From dan.bolser at gmail.com  Thu Dec  3 11:29:11 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:29:11 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
	<320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?

...

>> 2) missing description
>
> This is just a trivial bug/omission in the wiki example,

...

> Look - I seem to be learning Perl by osmosis ;)

Yay!


From dan.bolser at gmail.com  Thu Dec  3 11:30:44 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:30:44 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>

2009/12/3 Chris Fields <cjfields at illinois.edu>:
> Dan,
>
> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

...

>> I'm wondering, because with the FlaggedRevs extension you could
>> basically build a whole release in the wiki. Which would be fun if
>> nothing else!
>
> I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see (

I never said it would be beneficial, only that it would be fun.

http://www.mediawiki.org/wiki/Flaggedrevs


From florent.angly at gmail.com  Thu Dec  3 13:26:57 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 03 Dec 2009 10:26:57 -0800
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
	<4B17BAF7.2050604@informatik.uni-freiburg.de>
Message-ID: <4B1802F1.1040304@gmail.com>

Hi all,

Like Steffen, I've had a few burning questions too regarding 
LocatableSeq lately.

I've had an occasional issue with LocatableSeq. Most assembly-related 
modules use LocatableSeq objects. They specify the sequence start but 
not the sequence end. This works in most cases, but I've recently 
encountered very occasional error messages related to having not 
explicitely set the end of the sequence. I've been unable to put 
together a small test case to reproduce the bug easily.

My question is. If the start of the sequence is set, is it mandatory to 
set the end of the sequence? If so, then maybe the documentation needs 
to be explicit about it and maybe there needs to be a check that 
enforces that the end is set. In fact, it seems like if I provide a 
sequence and its start position, the LocatableSeq code should be able to 
automatically calculate its end, no?

Florent


Steffen Heyne wrote:
> Hello,
>
> so I tried to fix the problem with the location. Currently it works for
> me with the following changes:
>
> LocatableSeq.pm
>
> sub get_nse{
>
> ...
>
> 	my $ret;
> 	if ($self->strand() >= 0) {
> 		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
> 	} else {
> 		$ret = $id . $v. $char1 . $end . $char2 . $st ;
> 	}
> 	return $ret;
> }
>
> Then I recognized during the usage of $aln->remove_seq() that it cannot
> remove a seq as it uses a wrong NSE to lookup sequences. I changed the
> following:
>
> SimpleAlign.pm
>
> sub remove_seq {
>
> ...
> 	$id = $seq->id();
>     	$start = $seq->start();
>     	$end  = $seq->end();
>
> ## changed code:
>
> 	my $v = $seq->version ? '.'.$seq->version : '';
>     	if ($seq->strand >=0){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
> 	} elsif ($seq->strand == -1){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
> 	}	
> ...
>
> }
>
> The above code in LocatableSeq.pm worked in the case if I read an
> alignment in stockholm format and write it out in clustalw format. But
> if I read an alignment in clustalw and write it out as stockholm (or
> something else) it didn't worked, as the strand is not correctly set in
> ClustalW::next_aln. It works with the following changes:
>
> ClustalW.pm
>
> sub next_aln{
>
> ...
>
> 	my ( $sname, $start, $end, $strand );	## strand added
> 	$strand = 0;				## new, standard = 0???
>     	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
> %alignments ) {
>         if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
>         	( $sname, $start, $end ) = ( $1, $2, $3 );
> 		$strand = 1;			## new			
> 		if ($start > $end) {		## new
>        		($start, $end, $strand) = ($end, $start, -1); ##new
> 		}				## new
> 	
>       }
>         else {
>             ( $sname, $start ) = ( $name, 1 );
>             my $str = $alignments{$name};
>             $str =~ s/[^A-Za-z]//g;
>             $end = length($str);
>         }
>
>         my $seq = Bio::LocatableSeq->new(
>             -seq   => $alignments{$name},
>             -id    => $sname,
>             -start => $start,
>             -end   => $end,
> 	    -strand=> $strand			## new
>         );
>
> ...
>
> }
>
> So I don't know if I changed things at their correct position. And I
> found them only because I used certain functions. I dont know how broad
> the effect of a changed NSE in LocatableSeq.pm is to other Modules and
> functions. But I'm happy with my changes (so far :-)...).
>
> Do you will change this to your proposed way in bioperl trunk?
>
> Thanks!
>
> steffen
>
>
> Chris Fields schrieb:
>   
>> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
>>
>>     
>>> Hi,
>>>
>>> I'm using Bioperl for my research and it is very useful! Thank you!
>>>
>>> Currently I have a problem with locations tags of sequences. I read in
>>> seed alignments of Rfam (in stockholm format, but I think it is
>>> similar to other formats).
>>>
>>> If the location is like:
>>>
>>> AB194432.1/908-846
>>>
>>> the start/end values are changed to
>>>
>>> $seq->start = 846
>>> $seq->end = 908
>>>
>>> and therefore the new location (e.g.$seq->get_nse) is:
>>>
>>> AB194432.1/846-908
>>>
>>> The $seq->strand tag is correctly set to -1 in this case, but if the
>>> alignment is written out again (clustal, stockholm,...) this strand
>>> info is lost and the sequences have this "wrong" location. But this
>>> information is important in respect to the sequence accession number.
>>>
>>> Is there a way to set the location back to the original one or is this
>>> behavior desired? Any manually setting with $seq->start($val) failed
>>> due to automatic checking.
>>>
>>> I'm using bioperl 1.6.1
>>>
>>> Thanks!
>>>
>>> steffen
>>>       
>> This is a definite bug. We recently discussed amending the NSE format
>> due to this (the subject came up over the last few months or so); it's
>> fallen through the cracks.  Fortunaely it is very easy to fix (the
>> relevant method is in LocatableSeq).
>>
>> Does anyone have a problem with me adding this in?  It will change
>> output for only those instances where the strand is -1, so
>>
>> AB194432.1/908-846
>>
>> would be start = 846, end = 908, strand = -1
>>
>> AB194432.1/846-908
>>
>> would be start = 846, end = 908, strand = 1
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at illinois.edu  Thu Dec  3 23:16:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 22:16:48 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu>


On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote:

> 2009/12/3 Chris Fields <cjfields at illinois.edu>:
>> Dan,
>> 
>> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:
> 
> ...
> 
>>> I'm wondering, because with the FlaggedRevs extension you could
>>> basically build a whole release in the wiki. Which would be fun if
>>> nothing else!
>> 
>> I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (
> 
> I never said it would be beneficial, only that it would be fun.
> 
> http://www.mediawiki.org/wiki/Flaggedrevs

Ah, okay, that makes some sense.  

Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue.

chris


From rtbio.2009 at gmail.com  Fri Dec  4 08:57:21 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 4 Dec 2009 14:57:21 +0100
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
Message-ID: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>

Hello all,

I am working on Remote blast.Here,I am trying to get 2 parameters into the
remote blast code.They are

1.The input sequence that has to be sent to blast

2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
etc.,)

When I tried to take the organism parameter as an input from the
user,through a web page,the Remote blast was not giving any results i.e., it
says that there are no alignments found.

But,when I hard coded the organism in the code,it gives me the results i.e.,
3hits.

I could not understand this problem.Could any body please help me in this
regard?

My code is

sub blastcode
{

$input1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
               print OUTFILE @params;
              close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-Organism' => $organism );

while (my $input = $str->next_seq())

{
#Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

   # my $r = $factory->submit_blast('amino.fa');

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

      #    open(BLASTDEBUGFILE,'>',$debugfile);
       #   print BLASTDEBUGFILE $result->next_hit();
        #  close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);
$factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);
}

Regards,
Roopa.


From cjfields at illinois.edu  Fri Dec  4 09:59:17 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 4 Dec 2009 08:59:17 -0600
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
In-Reply-To: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
References: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu>

Roopa,

At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports).  See here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155

Also, are the returned hits specific for the genome?  You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why):

http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi

chris 
 
On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I am working on Remote blast.Here,I am trying to get 2 parameters into the
> remote blast code.They are
> 
> 1.The input sequence that has to be sent to blast
> 
> 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
> etc.,)
> 
> When I tried to take the organism parameter as an input from the
> user,through a web page,the Remote blast was not giving any results i.e., it
> says that there are no alignments found.
> 
> But,when I hard coded the organism in the code,it gives me the results i.e.,
> 3hits.
> 
> I could not understand this problem.Could any body please help me in this
> regard?
> 
> My code is
> 
> sub blastcode
> {
> 
> $input1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $input1;
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE @params;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
> '-Organism' => $organism );
> 
> while (my $input = $str->next_seq())
> 
> {
> #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>   my $r = $factory->submit_blast($input);
> 
>   # my $r = $factory->submit_blast('amino.fa');
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
> 
>     foreach my $rid ( @rids ) {
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>      #    open(BLASTDEBUGFILE,'>',$debugfile);
>       #   print BLASTDEBUGFILE $result->next_hit();
>        #  close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> $factory->save_output($filename);
> 
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> }
> 
> Regards,
> Roopa.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Fri Dec  4 13:27:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Fri, 4 Dec 2009 13:27:38 -0500
Subject: [Bioperl-l] Gene critical region analysis -- visual display
Message-ID: <deaa866a0912041027r71c49f58n7d467f050c2f49c6@mail.gmail.com>

Background:
I have been involved in aging research off and on for ~16 years.  My initial
focus was in the eventual decline of the "program" (because DNA has no ECC
and only limited redundancy) therefore my initial work (in the early 1990's
was focused on DNA repair genes (of which there about 150 in the human
genome) [1,2].  Most recently I have focused in on the DNA double strand
break repair processes (NHEJ) as a fundamental cause of aging because it may
fundamentally corrupt the genomes of individual cells.  (And as most
programmers would agree -- break the code and you break the program).
 Michael Lieber at UCLA has estimated that by the time a human is ~70 on the
order of several hundred genes in ones cells have been corrupted (which may
be an
indeterminate effect on the cells functioning).

Problem:
Just looking at the GenBank output for the human Artemis (DCLRE1C) gene
there are on the order of 18 SNPs and 8 possible phosphorylation sites (not
to mention other potential modification sites) -- this combined with the
fact that Methionine and Tryptophan and to a lesser extent Cysteine are more
susceptible to single base mutations (due the alteration of the codon->amino
acid coding even involving single base mutations/repairs) . There are
various programs to analyze such proteins for the critical sites -- SIFT and
the various programs pointed to by their sites.  Now it seems to me that one
could attack this problem by integrating SNPs, mutations, etc. at the
critical sites (where "critical" may or may not be at normal SNPs -- which
presumably are primarily at non-critical sites -- and those proteins where
if you change the coding sequence to non-synomonous amino acids you
potentially break the protein (the real interpretation of which will not be
understood until population studies are done).

So, in the process of looking at the DCLRE1C protein I asked myself, "Why is
there not a BioPerl function which simply enables a visual interpretation of
the critical sites of the protein?"  I.e. some color-coded representation of
the protein (which presumably has some augmented functionality to determine
things like probability or statistical information).  I.e. hand the function
a .fasta file and it will give you an visual (colored) analysis of the
critical nature of specific a.a. -- i.e. something which could be used by
genomic or SNP analysis (such as I presume that being done by 23andme -- as
well as other organizations) to begin to separate out the variations in the
human genome (e.g. SNPs) from the mutations which may effect individuals.

I have the C programming and to a lesser extent Perl experience to
contribute to this -- I lack the BioPerl wisdom to make it generally
available.

If anyone has some suggestions as to what functions/modules might be of use
(in providing a "single-look" view of gene a.a. whose mutations may be more
or less detrimental) I would appreciate hearing from them.

Robert Bradbury

1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press
(2006)
2. "Aging of the Genome",  J. Vijg, Oxford University Press (2007)


From maj at fortinbras.us  Sun Dec  6 17:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 6 Dec 2009 17:54:00 -0500
Subject: [Bioperl-l] bioperl-mode new feature: base class browsing
Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife>

Hi All, 
You can now browse pod of the base/parent classes of bioperl modules
with one keystroke using the latest update of bioperl-mode.
See http://bioperl.org/wiki/Emacs_bioperl-mode
Press "B" or "P" while in pod view to get a completion list 
of the parent classes for the module whose pod you're viewing.
cheers, 
MAJ


From mmokrejs at ribosome.natur.cuni.cz  Mon Dec  7 15:33:48 2009
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Mon, 07 Dec 2009 21:33:48 +0100
Subject: [Bioperl-l] Generalized reciprocal blast
In-Reply-To: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
References: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz>

Hi,
  I just stumbled across this older posting ... maybe you want to exploit
SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has
remote API available.
Martin

Robert Bradbury wrote:
> I would like to know whether or not anyone has attempted to create a
> "generalized" reciprocal blast component for BioPerl?
> 
> One sees papers all the time where they discuss running reciprocal blasts to
> compare a new species to an old "standard" species or a set of species or
> running an all-to-all set of comparisons to match up all of the "known"
> proteins from species and determine which are outliers (and therefore
> "novel").  There are also accumulating merged sets in NCBI HomoloGene (which
> seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes)
> and Ensembl (which seems to be working with a much larger set of 40-50
> genomes some of which may be somewhat incomplete and are certainly poorly
> "explored".
> 
> I have, I believe, seen code "fragments" from various authors, perhaps some
> on the BioPerl list, which perform some major subset of a typical
> "reciprocal blast".
> 
> Now what I am looking for is a relatively generalizable some-to-some
> reciprocal blast utility.  I want to be able to specify the genes (or gene
> family), e.g. some of the ~150 known DNA repair genes.  It would be helpful
> to also specify how "tolerant" the blast "true reciprocal" criteria are.
> There are some genes where there is a very strict 1-to-1 relationship across
> many genomes.  But for genes which involve relatively standard domains, e.g.
> "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for
> example its more like 5-to-5 and it would be really nice to be able to
> specify the strictness or quality level [1] for "matching" genes (and even
> which genes are to be excluded because they are known to be false
> homologues).
> 
> Then to top this off I want to be able to combine known public e.g.
> (HomoloGene / Uniigene / Ensembl) databases with perhaps local private
> databases or database subsets (e.g. emerging or specialized genomes).
> 
> The goal here of course to determine the precise phylogenetic relationships
> between all of the DNA repair genes and how there may be gain / loss /
> evolution of function that can be related to species characteristics (size,
> longevity, etc.).
> 
> Is there a generalized reciprocal blast component in BioPerl?  Or is it a
> "build-it-yourself" situation (that I have to believe has been built
> probably a few dozen times by various researchers / organizations /
> companies)?
> 
> Thanks,
> Robert Bradbury
> 
> 1. This would be handled in BioPerl with a customizable user function which
> could be tailored to handle specific cases -- for example a function which
> when handed a set of 100 potential "matches" could go through those 100
> matches, identify common domains, and then "re-rate" matches based on
> considerations such as the type and number of common domains, domains being
> in the same order, etc.  I.e. criteria which may be difficult to completely
> generalize across entire genomes but are fairly obvious if you are looking
> at a graphical replication of a gene set in HomoloGene.


From robert.bradbury at gmail.com  Mon Dec  7 15:41:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 7 Dec 2009 15:41:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions
Message-ID: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>

This comment could also have a subject line: "Why does Bioperl/get_sequence>
fork at all!  Why are not all operations sequential?  And if this is a
"default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
script if I have little or no capability of what the program uses when it
runs?  I may have days so I can bear the burden of relatively slow results
(and so can use sequential processing rather than parallel).

I've got a perl script that uses remote blast to blast a sequence against a
subset of the NCBI sequences.  It "mostly" works, in that it returns a
seemingly complete .bls result file but when attempting to look at the
sequences (so it can more accurately summarize the information from the
results than a standard blast report allows) it terminates prematurely with
errors.

The error is:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Couldn't fork: Resource temporarily unavailable
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::WebDBSeqI::_open_pipe
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
STACK: Bio::Perl::get_sequence
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
STACK: /home/bradbury/Genomes/bin/RB.pl:155
-----------------------------------------------------------

The precise line (in my code) whcih appears to be generating the error is:
    $seq = get_sequence('GenBank', $accsn);

Now this can be a problem if NCBI/Genbank fails due to load conditions --
but this specific failure (which is repeatable is due to most likely hitting
the user process limit restrictions) -- but the small blast results work
fine -- its only if the Blast has returned several hundred hits that it runs
into this problem.

Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
queries (to get a sequence) with complete disregard of the environment
(process limits, NCBI limits, etc.).  But I do not know enough about how
this works to point a finger at some specific function.  As a result
get_sequence process results are accumulated, summarized, etc. without ever
having issued to respect "wait-variant()) calls to collect former children
[This IMO would clearly be a bug.]

It could be adjusted to by allowing the BioPerl library to run in 3 modes.
 (1) completely synchronous -- if you fork you wait until its done -- and
you collect "it" and any fork fails then one either collects the process or
switches to the non-conservative mode.

Robert


From cjfields at illinois.edu  Mon Dec  7 16:08:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 7 Dec 2009 15:08:40 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <A36A88C9-D94C-4559-A629-56EB8F374DAC@illinois.edu>

Robert, 

If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not.  All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly.

See the POD for those specific modules for more information.

chris

On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
> script if I have little or no capability of what the program uses when it
> runs?  I may have days so I can bear the burden of relatively slow results
> (and so can use sequential processing rather than parallel).
> 
> I've got a perl script that uses remote blast to blast a sequence against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from the
> results than a standard blast report allows) it terminates prematurely with
> errors.
> 
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
> 
> The precise line (in my code) whcih appears to be generating the error is:
>    $seq = get_sequence('GenBank', $accsn);
> 
> Now this can be a problem if NCBI/Genbank fails due to load conditions --
> but this specific failure (which is repeatable is due to most likely hitting
> the user process limit restrictions) -- but the small blast results work
> fine -- its only if the Blast has returned several hundred hits that it runs
> into this problem.
> 
> Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc. without ever
> having issued to respect "wait-variant()) calls to collect former children
> [This IMO would clearly be a bug.]
> 
> It could be adjusted to by allowing the BioPerl library to run in 3 modes.
> (1) completely synchronous -- if you fork you wait until its done -- and
> you collect "it" and any fork fails then one either collects the process or
> switches to the non-conservative mode.
> 
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec  7 16:24:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Dec 2009 13:24:54 -0800
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>

Robert -

You seem to be mixing the blast remote and the sequence query  
retrieval problems. These messages are related to the remote retrieval  
of sequences.
  It is hard to tell from your message specifically which modules you  
are using or how you are querying NCBI - there are several ways to do  
this either with the NCBI tools or the Bio::DB::GenBank.
  If you are using Bio::DB::Query::GenBank that allows for async  
access and has built in controls to adhere to the wait variant that  
NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method  
does any sort of thing (at least when it was originally written).

I always advocate if you want highly available and reliable access to  
sequences you should download the nr or whichever DB and use the local  
indexing tools for the retrieval.  Once you start doing hundreds of  
queries I don't see any good reason to be doing the query against NCBI  
directly given unreliabilities of the web and services. Local  
databases are faster and more reliable for most people so I urge you  
take advantage of the tools which provide local database access with  
the same APIs.


I would like to comment that the tone of your posts to the list are  
not particularly helpful.   I wonder if you are actually asking for  
help or just interested in complaining about when things don't work as  
you expect? This is a collaborative and volunteer-only project, with  
the principles of working together to make useful toolkit.  We  
encourage you to build programs and applications from this base that  
suit your needs, but not all things will be directly implemented in  
the toolkit if they aren't generic enough (at least that is my  
feeling, the other Core devs help with these decisions).
   If there is a useful, generic, and reusable part we would like that  
to be part of the API. Otherwise we suggest the new application that  
fits a developer's vision. We encourage you to write (and publish)  
that application separately, but certainly encourage bug (and fixes)  
submissions and also code contributions for new features where they  
can be seen as generally useful.

-jason
On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/ 
> get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable  
> BioPerl
> script if I have little or no capability of what the program uses  
> when it
> runs?  I may have days so I can bear the burden of relatively slow  
> results
> (and so can use sequential processing rather than parallel).
>
> I've got a perl script that uses remote blast to blast a sequence  
> against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from  
> the
> results than a standard blast report allows) it terminates  
> prematurely with
> errors.
>
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
>
> The precise line (in my code) whcih appears to be generating the  
> error is:
>    $seq = get_sequence('GenBank', $accsn);
>
> Now this can be a problem if NCBI/Genbank fails due to load  
> conditions --
> but this specific failure (which is repeatable is due to most likely  
> hitting
> the user process limit restrictions) -- but the small blast results  
> work
> fine -- its only if the Blast has returned several hundred hits that  
> it runs
> into this problem.
>
> Now what it sounds like to me is an attempt to do multiple  
> asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about  
> how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc.  
> without ever
> having issued to respect "wait-variant()) calls to collect former  
> children
> [This IMO would clearly be a bug.]
>
> It could be adjusted to by allowing the BioPerl library to run in 3  
> modes.
> (1) completely synchronous -- if you fork you wait until its done --  
> and
> you collect "it" and any fork fails then one either collects the  
> process or
> switches to the non-conservative mode.
>
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From Jonas_Schaer at gmx.de  Tue Dec  8 10:21:58 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Tue, 8 Dec 2009 16:21:58 +0100
Subject: [Bioperl-l] fasta format
Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>

Hi there,
I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!).

------------- EXCEPTION -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #49 '
..' is 28 != 101 chars.
STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771
STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
STACK main::readfasta blast_eval.pm:174
STACK toplevel blast_eval.pm:83
-------------------------------------

indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/
DB/Fasta.pm line 1054.


Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ?

Thanks in advance for any help! 

Regards, Jonas


From awitney at sgul.ac.uk  Tue Dec  8 12:01:58 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 8 Dec 2009 17:01:58 +0000
Subject: [Bioperl-l] package to associate genes with branches on trees?
Message-ID: <DB3D347F-EB9E-4A59-87D2-3E1A5FACF154@sgul.ac.uk>

Hi,

I have been generating some trees with Phylip (pars) and then  
processing them with Bioperl. These trees are generated by comparing  
multiple strains of a bacterial organism by presence/absence (0/1)  
calls for each gene.

I was wondering of there was any package in Bioperl to try to  
determine if any specific genes were associated with specific branches  
of the trees? Or if anyone knew of another tool that can do this?

thanks for any help

adam


From jason at bioperl.org  Tue Dec  8 12:44:43 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 8 Dec 2009 09:44:43 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>

you can run
sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or  
that is installed when you install the Bioperl scripts)
$ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa
# rename it back
$ mv yournewfile.fa yourfile.fa

or
$ sreformat fasta yourfile.fa > yournewfile.fa
$ mv yournewfile.fa yourfile.fa


-jason
On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:

> Hi there,
> I have a little question concerning bioperl. I have  
> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read  
> in some fasta files. first it worked fine, but now i have some  
> fastafiles in slightly different format (not all lines have the same  
> length!).
>
> ------------- EXCEPTION -------------
> MSG: Each line of the fasta entry must be the same length except the  
> last.
>    Line above #49 '
> ..' is 28 != 101 chars.
> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ 
> Fasta.pm:771
> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
> STACK main::readfasta blast_eval.pm:174
> STACK toplevel blast_eval.pm:83
> -------------------------------------
>
> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ 
> site/lib/Bio/
> DB/Fasta.pm line 1054.
>
>
> Is there any way to use these fasta files with diffrent length of  
> lines with this fasta.pm module or will i have to change the format  
> of my fasta-files(big databases...) ?
>
> Thanks in advance for any help!
>
> Regards, Jonas
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Tue Dec  8 23:30:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 8 Dec 2009 22:30:26 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference
Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu>

All,

For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego.  This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13.  The exact day and time is somewhat flexible depending on attendees' schedules.

For those interested, sign up here:

http://www.bioperl.org/wiki/GMOD_2010_Meeting

For those interested in attending the GMOD meeting or PAG:

http://gmod.org/wiki/January_2010_GMOD_Meeting

I can envision the following items popping up:

* Refactoring of Alignment and GFF3/FeatureIO
* Addressing BioPerl's monolithic nature
* Moose and Perl 6
* Documentation

Any others?

chris


From akarger at CGR.Harvard.edu  Wed Dec  9 10:01:45 2009
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 9 Dec 2009 10:01:45 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>

> Is there any way to use these fasta files with diffrent length of
> lines with this fasta.pm module or will i have to change the format
> of my fasta-files(big databases...) ?
> 

Jonas,

It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back.

To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy).

The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like).

Let me know if you have a problem.

-Amir Karger
Life Sciences Research Computing, FAS IT
Harvard University


From Kevin.M.Brown at asu.edu  Wed Dec  9 10:26:22 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 9 Dec 2009 08:26:22 -0700
Subject: [Bioperl-l] fasta format
In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>

Even easier to accomplish in one step. Read in the fasta file and output
it right to another fasta file with SeqIO

my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
while (my $seq = $in->next){$out->write_seq($seq);}

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, December 09, 2009 8:02 AM
> To: Jonas Schaer; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> > Is there any way to use these fasta files with diffrent length of
> > lines with this fasta.pm module or will i have to change the format
> > of my fasta-files(big databases...) ?
> > 
> 
> Jonas,
> 
> It's not Bioperl, but for a quick fix you can use the 
> Scriptome. Use the change_fasta_to_tab script 
> (http://sysbio.harvard.edu/csb/resources/computational/scripto
> me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> format__change_fasta_to_tab_) to change your FASTA into a 
> tab-delimited file. Then use the next tool 
> (change_tab_to_fasta) to change your files back.
> 
> To use a tool: change the input and output file names on the 
> website, then cut and paste the Perl script from the green 
> box into a CMD window. The script works one sequence at a 
> time, so it doesn't need a lot of memory. (As long as you 
> have enough disk space to store the tab-delimited copy).
> 
> The recreated FASTAs will be 60 characters per line (although 
> you can hand-edit the line after you paste it to be whatever 
> number of characters you'd like).
> 
> Let me know if you have a problem.
> 
> -Amir Karger
> Life Sciences Research Computing, FAS IT
> Harvard University
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Russell.Smithies at agresearch.co.nz  Wed Dec  9 14:44:41 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 10 Dec 2009 08:44:41 +1300
Subject: [Bioperl-l] fasta format
In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
	<1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>

It's even easier as the script is already written for you :-)

bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa


--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
> Sent: Thursday, 10 December 2009 4:26 a.m.
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> Even easier to accomplish in one step. Read in the fasta file and output
> it right to another fasta file with SeqIO
> 
> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
> while (my $seq = $in->next){$out->write_seq($seq);}
> 
> Kevin Brown
> Center for Innovations in Medicine
> Biodesign Institute
> Arizona State University
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> > Sent: Wednesday, December 09, 2009 8:02 AM
> > To: Jonas Schaer; bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] fasta format
> >
> > > Is there any way to use these fasta files with diffrent length of
> > > lines with this fasta.pm module or will i have to change the format
> > > of my fasta-files(big databases...) ?
> > >
> >
> > Jonas,
> >
> > It's not Bioperl, but for a quick fix you can use the
> > Scriptome. Use the change_fasta_to_tab script
> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> > format__change_fasta_to_tab_) to change your FASTA into a
> > tab-delimited file. Then use the next tool
> > (change_tab_to_fasta) to change your files back.
> >
> > To use a tool: change the input and output file names on the
> > website, then cut and paste the Perl script from the green
> > box into a CMD window. The script works one sequence at a
> > time, so it doesn't need a lot of memory. (As long as you
> > have enough disk space to store the tab-delimited copy).
> >
> > The recreated FASTAs will be 60 characters per line (although
> > you can hand-edit the line after you paste it to be whatever
> > number of characters you'd like).
> >
> > Let me know if you have a problem.
> >
> > -Amir Karger
> > Life Sciences Research Computing, FAS IT
> > Harvard University
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Wed Dec  9 15:18:08 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 9 Dec 2009 15:18:08 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
	<18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife>

$ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, 
">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas

----- Original Message ----- 
From: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
To: "'Kevin Brown'" <Kevin.M.Brown at asu.edu>; <bioperl-l at bioperl.org>
Sent: Wednesday, December 09, 2009 2:44 PM
Subject: Re: [Bioperl-l] fasta format


> It's even easier as the script is already written for you :-)
>
> bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa
>
>
> --Russell
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
>> Sent: Thursday, 10 December 2009 4:26 a.m.
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] fasta format
>>
>> Even easier to accomplish in one step. Read in the fasta file and output
>> it right to another fasta file with SeqIO
>>
>> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
>> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
>> while (my $seq = $in->next){$out->write_seq($seq);}
>>
>> Kevin Brown
>> Center for Innovations in Medicine
>> Biodesign Institute
>> Arizona State University
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
>> > Sent: Wednesday, December 09, 2009 8:02 AM
>> > To: Jonas Schaer; bioperl-l at bioperl.org
>> > Subject: Re: [Bioperl-l] fasta format
>> >
>> > > Is there any way to use these fasta files with diffrent length of
>> > > lines with this fasta.pm module or will i have to change the format
>> > > of my fasta-files(big databases...) ?
>> > >
>> >
>> > Jonas,
>> >
>> > It's not Bioperl, but for a quick fix you can use the
>> > Scriptome. Use the change_fasta_to_tab script
>> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
>> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
>> > format__change_fasta_to_tab_) to change your FASTA into a
>> > tab-delimited file. Then use the next tool
>> > (change_tab_to_fasta) to change your files back.
>> >
>> > To use a tool: change the input and output file names on the
>> > website, then cut and paste the Perl script from the green
>> > box into a CMD window. The script works one sequence at a
>> > time, so it doesn't need a lot of memory. (As long as you
>> > have enough disk space to store the tab-delimited copy).
>> >
>> > The recreated FASTAs will be 60 characters per line (although
>> > you can hand-edit the line after you paste it to be whatever
>> > number of characters you'd like).
>> >
>> > Let me know if you have a problem.
>> >
>> > -Amir Karger
>> > Life Sciences Research Computing, FAS IT
>> > Harvard University
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kellert at ohsu.edu  Wed Dec  9 19:36:13 2009
From: kellert at ohsu.edu (Tom Keller)
Date: Wed, 9 Dec 2009 16:36:13 -0800
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>

Greetings,
Is there a simple way to map a list of ensembl ids to the NCBI gis?

thanks,
Tom

Thomas (Tom) Keller
kellert at ohsu.edu
503.494.2442
6339b R Jones Hall (BSc/CROET)
www.ohsu.edu/xd/research/research-cores/dna-analysis/


From cjfields at illinois.edu  Wed Dec  9 20:59:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 9 Dec 2009 19:59:37 -0600
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu>

Tom,

Probably best to do this via BioMart:

http://www.ensembl.org/biomart/

I would assume you can also do this via the ensembl perl API as well.

Also, have a look at the UniProt ID Mapper:

http://www.uniprot.org/?tab=mapping

chris

On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:

> Greetings,
> Is there a simple way to map a list of ensembl ids to the NCBI gis?
> 
> thanks,
> Tom
> 
> Thomas (Tom) Keller
> kellert at ohsu.edu
> 503.494.2442
> 6339b R Jones Hall (BSc/CROET)
> www.ohsu.edu/xd/research/research-cores/dna-analysis/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lovebaby39 at gmail.com  Thu Dec 10 09:22:14 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 22:22:14 +0800
Subject: [Bioperl-l] about bioperl issue
Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC>

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS');
my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2);
my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------ 
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R20080801-1.seq.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091210/0431bad7/attachment-0002.txt>

From SMarkel at accelrys.com  Thu Dec 10 09:47:36 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 10 Dec 2009 06:47:36 -0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net>

Reginald,

I didn't see anything highlighted in red but the three strings in the
pairwise alignment display can be obtained from an HSP using

    $hsp->query_string()
    $hsp->hit_string()
    $hsp->homology_string()

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh
Sent: Thursday, 10 December 2009 6:22 AM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] about bioperl issue
Importance: High

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh


From David.Messina at sbc.su.se  Thu Dec 10 10:09:31 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:09:31 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>

Hi Reginald,

None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.

Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.


Dave


From David.Messina at sbc.su.se  Thu Dec 10 10:36:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:36:49 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>

Hi Reginald,

Please keep all replies on the list so that everyone can follow the thread.

In a separate email, Scott gave the answer you were looking for,  I think.

Namely:
   $hsp->query_string()
OR
   $hsp->hit_string()


Dave


On Dec 10, 2009, at 16:31, Hsueh wrote:

> Dear Dave Messina
> 
> I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
> 
> Thank you
> 
> Reginald Hsueh
> 
> ------------------------------------------------------------------------------------------------------------------------------
> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
>                  |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
> ------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> --------------------------------------------------
> From: "Dave Messina" <David.Messina at sbc.su.se>
> Sent: Thursday, December 10, 2009 11:09 PM
> To: "Hsueh" <lovebaby39 at gmail.com>
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] about bioperl issue
> 
>> Hi Reginald,
>> 
>> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.
>> 
>> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.
>> 
>> 
>> Dave


From lovebaby39 at gmail.com  Thu Dec 10 10:53:00 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 23:53:00 +0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <AEA3314B45B14452A4BD1E3A2235AA5D@SHAPC>

Dear Dave Messina

Thank you for your replies.

Reginald Hsueh

--------------------------------------------------
From: "Dave Messina" <David.Messina at sbc.su.se>
Sent: Thursday, December 10, 2009 11:36 PM
To: "Hsueh" <lovebaby39 at gmail.com>
Cc: <bioperl-l at bioperl.org>
Subject: Re: [Bioperl-l] about bioperl issue

> Hi Reginald,
>
> Please keep all replies on the list so that everyone can follow the 
> thread.
>
> In a separate email, Scott gave the answer you were looking for,  I think.
>
> Namely:
>   $hsp->query_string()
> OR
>   $hsp->hit_string()
>
>
>
> Dave
>
>
>
>
> On Dec 10, 2009, at 16:31, Hsueh wrote:
>
>> Dear Dave Messina
>>
>> I need to get the string that is 
>> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
>>
>> Thank you
>>
>> Reginald Hsueh
>>
>> ------------------------------------------------------------------------------------------------------------------------------
>> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>> 206
>>                  |||||| ||||||||||||||||||    |||| || |||||| 
>> |||||||||||| ||
>> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 
>> 173
>> ------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>> --------------------------------------------------
>> From: "Dave Messina" <David.Messina at sbc.su.se>
>> Sent: Thursday, December 10, 2009 11:09 PM
>> To: "Hsueh" <lovebaby39 at gmail.com>
>> Cc: <bioperl-l at bioperl.org>
>> Subject: Re: [Bioperl-l] about bioperl issue
>>
>>> Hi Reginald,
>>>
>>> None of the words in your email or the attachment are colored red ? 
>>> unfortunately any kind of formatting tends to get removed from emails 
>>> send to mailing lists.
>>>
>>> Could you be more specific about what part of the blast report you are 
>>> not able to get? You could even just copy and paste that particular bit 
>>> of the report into your reply if it's not clear what to call it.
>>>
>>>
>>> Dave


>>>>Dear
>>>>
>>>>The following is code.
>>>>
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>my at params_rb = ( 'program'  => 'blastn',
>>>>            'database' => 'DB\\RB_GUS\\RB_GUS');
>>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);
>>>>
>>>>my $input_rb = Bio::Seq->new(-id  =>"test_query",
>>>>                       -seq => $testline2);
>>>>my $blast_report_rb = $factory_rb->blastall($input_rb);
>>>>
>>>> while (my $result_rb =  $blast_report_rb-> next_result ) {
>>>>  while (my $hit_rb = $result_rb->next_hit()){
>>>>   while (my $hsp_rb = $hit_rb->next_hsp()){
>>>>    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " 
>>>> , $hsp_rb->score , "\n" ;
>>>>    #print " ",$hit->name,"\n";
>>>>   }
>>>>  }
>>>> }
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>
>>>>I know how to get "name", "evalue" and  "score", but I don't know how 
>>>>to get the word which is in red color. (or please see attachment.)
>>>>------------------------------------------------------------------------------------------------------------------
>>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>>>>206
>>>>                   |||||| ||||||||||||||||||    |||| || |||||| 
>>>> |||||||||||| ||
>>>>Sbjct: 114 
>>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
>>>>------------------------------------------------------------------------------------------------------------------
>>>>
>>>>I will appreciate if you could tell me how to do it.
>>>>Thank you.
>>>>
>>>>Reginald Hsueh 


From pg4 at sanger.ac.uk  Thu Dec 10 15:50:40 2009
From: pg4 at sanger.ac.uk (Pablo Marin-Garcia)
Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT)
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
References: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
Message-ID: <alpine.DEB.1.10.0912102042180.8440@deskpro17122.dynamic.sanger.ac.uk>


If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) 
please read this recent thread at ensembl-dev:

http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html

Seems that the ensembl gene mapping to NCBI is done through translation so 
the noncoding genes do not have the corresponding NCBI gene mapped.


   -Pablo


> ------------------------------
>
> Message: 4
> Date: Wed, 9 Dec 2009 19:59:37 -0600
> From: Chris Fields <cjfields at illinois.edu>
> Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi
> To: Tom Keller <kellert at ohsu.edu>
> Cc: BioPerl-List <bioperl-l at bioperl.org>
> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu>
> Content-Type: text/plain; charset=us-ascii
>
> Tom,
>
> Probably best to do this via BioMart:
>
> http://www.ensembl.org/biomart/
>
> I would assume you can also do this via the ensembl perl API as well.
>
> Also, have a look at the UniProt ID Mapper:
>
> http://www.uniprot.org/?tab=mapping
>
> chris
>
> On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:
>
>> Greetings,
>> Is there a simple way to map a list of ensembl ids to the NCBI gis?
>>
>> thanks,
>> Tom
>>
>> Thomas (Tom) Keller
>> kellert at ohsu.edu
>> 503.494.2442
>> 6339b R Jones Hall (BSc/CROET)
>> www.ohsu.edu/xd/research/research-cores/dna-analysis/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>

====================================================================
                      Pablo Marin-Garcia, PhD

                     \\//          (Argiope bruennichi
                \/\/`(||>O:'\/\/   with stabilimentum)
                     //\\

Sanger Institute                |  PostDoc / Computer Biologist
Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH     |  room : N333
United Kingdom                  |  email: pablo.marin at sanger.ac.uk
====================================================================


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From umjsm at leeds.ac.uk  Fri Dec 11 11:44:42 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Fri, 11 Dec 2009 16:44:42 +0000
Subject: [Bioperl-l] extract and write a pdb chain
Message-ID: <1260549882.6484.11.camel@limm-pc1254>

Hello,

I am trying to do a very easy think but I don't get it. I want to write
in a file a chain of a pdb. I have try a lot of thinks but what I think
that it should work is the next script:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;

my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $chain ($struc->get_chains) {
	if($chain->id eq "A"){
		$new_entry->chain($chain);
		last;
	}
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');#
$out->write_structure($new_entry);

it doesn't. I get the next error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: add_chain: first argument needs to be a Model object ()

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335
STACK:
Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304
STACK: read_pdb.pl:10
-----------------------------------------------------------

As far I understand the documentation, the method chain of the object
Bio::Structure::Entry requires an as input an object of type Chain.

Any solution will be very welcome.

best regards,
Joan


From wkretzsch at gmail.com  Fri Dec 11 14:22:31 2009
From: wkretzsch at gmail.com (Warren W. Kretzschmar)
Date: Fri, 11 Dec 2009 14:22:31 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files
	generated by Hudson's ms
Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>

Hi,
I'm new to the bioperl community.  I've created a perl module that
reads in msOUT files generated by Hudson's ms.  As far as I
understand, there is no SeqIO module to read and output these files?
If so, I propose to create a module that does this.  Any suggestions?

Thanks,
Warren Kretzschmar


From maj at fortinbras.us  Fri Dec 11 14:59:53 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 11 Dec 2009 14:59:53 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT
	filesgenerated by Hudson's ms
In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife>

Hi Warren,
I say go for it. You'll want to have a look at
http://bio.perl.org/wiki/Advanced_BioPerl
which explains most of our tips and "policies" for prospective
code contributors, as well as
http://bio.perl.org/wiki/HOWTO:SeqIO
which details SeqIO from the user's perspective. Look
carefully at some Bio::SeqIO::* modules for implementation
details. If you have code to propose, use
http://bugzilla.bioperl.org
and enter a new enhancement, where you can upload
your module for us to review.
MAJ
----- Original Message ----- 
From: "Warren W. Kretzschmar" <wkretzsch at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 11, 2009 2:22 PM
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by 
Hudson's ms


> Hi,
> I'm new to the bioperl community.  I've created a perl module that
> reads in msOUT files generated by Hudson's ms.  As far as I
> understand, there is no SeqIO module to read and output these files?
> If so, I propose to create a module that does this.  Any suggestions?
>
> Thanks,
> Warren Kretzschmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bosborne11 at verizon.net  Fri Dec 11 15:37:45 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 11 Dec 2009 15:37:45 -0500
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260549882.6484.11.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
Message-ID: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>

Joan,

It looks to me like the first argument to the add_chain() method has  
to be a Model object, the second is the Chain itself. See Structure/ 
Entry.pm, for example. However if you're seeing some documentation  
that says something else then tell us where, it needs to be corrected.

In Bio::Structure an Entry consists of one or Models, each of which  
has one or more Chains. This allows you to build macromolecular  
complexes (an Entry), which could have more than one defined proteins  
or protein complexes (Models).

Brian O.

On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:

> Hello,
>
> I am trying to do a very easy think but I don't get it. I want to  
> write
> in a file a chain of a pdb. I have try a lot of thinks but what I  
> think
> that it should work is the next script:
>
> use Bio::Structure::IO;
> use strict;
>
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> =>
> 'pdb');
> my $struc = $structio->next_structure;
>
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
>
> for my $chain ($struc->get_chains) {
> 	if($chain->id eq "A"){
> 		$new_entry->chain($chain);
> 		last;
> 	}
> }
>
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');#
> $out->write_structure($new_entry);
>
> it doesn't. I get the next error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: add_chain: first argument needs to be a Model object ()
>
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> 368
> STACK:
> Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:335
> STACK:
> Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:391
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:304
> STACK: read_pdb.pl:10
> -----------------------------------------------------------
>
> As far I understand the documentation, the method chain of the object
> Bio::Structure::Entry requires an as input an object of type Chain.
>
> Any solution will be very welcome.
>
> best regards,
> Joan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Sun Dec 13 16:48:13 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sun, 13 Dec 2009 21:48:13 +0000
Subject: [Bioperl-l] combining tree image with heatmap
Message-ID: <4B25611D.6050009@sgul.ac.uk>

I am trying to draw a tree on the side of a heatmap image, much like you
see after clustering data.

I was wondering if anyone has managed to do this using bioperl? I can
draw the two separately, but can't quite seem to work out how to put the
two together and get the nodes to line up with the correct row of
clustering data.

Is there any particular module to look at?

thanks for any help

adam


From dhwani1030 at gmail.com  Sat Dec 12 15:04:01 2009
From: dhwani1030 at gmail.com (dhwani gandhi)
Date: Sat, 12 Dec 2009 15:04:01 -0500
Subject: [Bioperl-l] Bioperl code help
Message-ID: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>

Hi,
I am very new to Bioperl but I am somewhat familiar to perl though.

I write my perl programs in Notepad++ and run them in cmd.

Now, I want to run Bioperl programs. I just installed bioperl on my
computer. And I have a program using bioperl modules in Notepad++.

My question is how to run these programs? Can they be ran in cmd as well? or
do I use ppm?

Please help.

Thanks,
-Dhwani Gandhi.


From eric_donaldson at med.unc.edu  Sun Dec 13 18:15:24 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Sun, 13 Dec 2009 18:15:24 -0500
Subject: [Bioperl-l] problem with install
Message-ID: <f77787b07d66b.4b252f3c@med.unc.edu>

Hello,

Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the 

fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10.

But now I get an error when trying to run a bioperl script.

Here is the error:

Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8.
BEGIN failed--compilation aborted at blastparser.pl line 8.


I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me?

Thank you,

Eric


Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From jason at bioperl.org  Sun Dec 13 20:24:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 17:24:26 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f77787b07d66b.4b252f3c@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>

Hi Eric -

Bio::Tools::BPlite is no longer supported in Bioperl - it was  
deprecated several releases ago.
It was replaced with Bio::SearchIO

-jason
On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:

> Hello,
>
> Today I downloaded bioperl 1.61 on my new macbook pro using fink.  I  
> used the
>
> fink install bioperl.pm-588 as I could not get it to instal using  
> the perl version 5.10.
>
> But now I get an error when trying to run a bioperl script.
>
> Here is the error:
>
> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ 
> perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.pl line 8.
> BEGIN failed--compilation aborted at blastparser.pl line 8.
>
>
> I am a novice at unix and bioperl so I do not know how to  
> troubleshoot this, would you please hleo me?
>
> Thank you,
>
> Eric
>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Sun Dec 13 23:09:45 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 20:09:45 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f79059397d7fa.4b255f0b@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>

So you installed perl-5.10 or using system perl?  I'm confused if you  
actually installed bioperl.pm or not via fink?

It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5  
which is one of the dirs it would have installed in, but I don't think  
you actually installed bioperl.

you can try and do:
$ locate Bio/SearchIO.pm

We'll see if any of the other osx/fink gurus are on the list that can  
help or you can install it via CPAN I guess.

-jason
On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:

>
> I actually tried a different blastparser that uses BIO::SearchIO and  
> got the same message:
>
> Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ 
> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.new.pl line 8.
> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>
> I suspect there is a path problem, but am not savvy enough to know  
> how to fix it.  I am really just a hacker.... I have several scripts  
> that I use regularly and that I know how to modify, but am lost when  
> they don't work...
>
> Thanks for any help,
>
> Eric
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 8:24 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: bioperl-l at bioperl.org
>
>> Hi Eric -
>>
>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>> was
>> deprecated several releases ago.
>> It was replaced with Bio::SearchIO
>>
>> -jason
>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>
>>> Hello,
>>>
>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>> fink.  I
>>> used the
>>>
>>> fink install bioperl.pm-588 as I could not get it to instal
>> using
>>> the perl version 5.10.
>>>
>>> But now I get an error when trying to run a bioperl script.
>>>
>>> Here is the error:
>>>
>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>> /sw/lib/
>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>> /sw/lib/perl5/darwin /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>
>>>
>>> I am a novice at unix and bioperl so I do not know how
>> to
>>> troubleshoot this, would you please hleo me?
>>>
>>> Thank you,
>>>
>>> Eric
>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>>
>> < 
>> eric_donaldson.vcf>_______________________________________________>  
>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Mon Dec 14 00:10:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 21:10:54 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f7a30bbc786b3.4b258092@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>

Eric -
please CC the bioperl list when responding so others can help - I  
can't be the only answerer.

But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you  
would need to make sure that is added to your PERL5LIB.
There are some help docs on the perl sites I expect on how to get your  
PATHs in order.

Or you can just install via CPAN which will put it in the right path -  
there are docs on the bioperl website about installing via CPAN.

-jason
On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:

> Hi Jason,
>
> The fink package did not have support for perl 5.10, so I attempted  
> to install the perl 5.8.6 package.
>
> When I attempted: locate Bio/SearchIO.pm
> I got: -bash: $: command not found
>
> So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ 
> SearchIO.pm  I cannot access it.  Do I need to use the older version  
> of perl?
>
> Would it be better to install with CPAN?  If so, can you send me to  
> a page that has instructions?
>
> Thank you so much!
>
> ERic
>
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 11:10 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: BioPerl List <bioperl-l at bioperl.org>
>
>> So you installed perl-5.10 or using system perl?  I'm
>> confused if you
>> actually installed bioperl.pm or not via fink?
>>
>> It seems like since your @INC or $PERL5LIB points to
>> /sw/lib/perl5
>> which is one of the dirs it would have installed in, but I don't
>> think
>> you actually installed bioperl.
>>
>> you can try and do:
>> $ locate Bio/SearchIO.pm
>>
>> We'll see if any of the other osx/fink gurus are on the list
>> that can
>> help or you can install it via CPAN I guess.
>>
>> -jason
>> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
>>
>>>
>>> I actually tried a different blastparser that uses
>> BIO::SearchIO and
>>> got the same message:
>>>
>>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
>> /sw/lib/perl5/
>>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
>> /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.new.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>>>
>>> I suspect there is a path problem, but am not savvy enough to
>> know
>>> how to fix it.  I am really just a hacker.... I have
>> several scripts
>>> that I use regularly and that I know how to modify, but am
>> lost when
>>> they don't work...
>>>
>>> Thanks for any help,
>>>
>>> Eric
>>>
>>> ----- Original Message -----
>>> From: Jason Stajich <jason at bioperl.org>
>>> Date: Sunday, December 13, 2009 8:24 pm
>>> Subject: Re: [Bioperl-l] problem with install
>>> To: eric_donaldson at med.unc.edu
>>> Cc: bioperl-l at bioperl.org
>>>
>>>> Hi Eric -
>>>>
>>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>>>> was
>>>> deprecated several releases ago.
>>>> It was replaced with Bio::SearchIO
>>>>
>>>> -jason
>>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>>>> fink.  I
>>>>> used the
>>>>>
>>>>> fink install bioperl.pm-588 as I could not get it to instal
>>>> using
>>>>> the perl version 5.10.
>>>>>
>>>>> But now I get an error when trying to run a bioperl script.
>>>>>
>>>>> Here is the error:
>>>>>
>>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>>>> /sw/lib/
>>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>>>> /sw/lib/perl5/darwin /
>>>>> Library/Perl/Updates/5.10.0
>> /System/Library/Perl/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/5.10.0
>>>> /Library/Perl/5.10.0/
>>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>>>> /Network/Library/
>>>>> Perl/5.10.0/darwin-thread-multi-2level
>>>> /Network/Library/Perl/5.10.0 /
>>>>> Network/Library/Perl
>> /System/Library/Perl/Extras/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>>>> at
>>>>> blastparser.pl line 8.
>>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>>>
>>>>>
>>>>> I am a novice at unix and bioperl so I do not know how
>>>> to
>>>>> troubleshoot this, would you please hleo me?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>> Eric F. Donaldson, Ph.D.
>>>>> Research Assistant Professor, Ralph Baric Lab
>>>>> University of North Carolina
>>>>> Department of Epidemiology
>>>>>
>>>>>
>>>>>
>>>> <
>>>>
>> eric_donaldson.vcf>_______________________________________________>
>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>>
>>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>> <eric_donaldson.vcf>
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From awitney at sgul.ac.uk  Mon Dec 14 04:36:19 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 14 Dec 2009 09:36:19 +0000
Subject: [Bioperl-l] Bioperl code help
In-Reply-To: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
References: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
Message-ID: <4B260713.3070402@sgul.ac.uk>


bioperl programs are just perl programs so you should run them in
exactly the same way as your perl prorgrams, from the command line

HTH

adam

On 12/12/2009 20:04, dhwani gandhi wrote:
> Hi,
> I am very new to Bioperl but I am somewhat familiar to perl though.
> 
> I write my perl programs in Notepad++ and run them in cmd.
> 
> Now, I want to run Bioperl programs. I just installed bioperl on my
> computer. And I have a program using bioperl modules in Notepad++.
> 
> My question is how to run these programs? Can they be ran in cmd as well? or
> do I use ppm?
> 
> Please help.
> 
> Thanks,
> -Dhwani Gandhi.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From umjsm at leeds.ac.uk  Mon Dec 14 05:39:32 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 10:39:32 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
Message-ID: <1260787172.7359.0.camel@limm-pc1254>

Hi Brian,

I am not calling the method add_chain, I am calling the method chain

http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6

and if I don't use as an argument an object of type

Bio::Structure::Chain

I get an error like this (-->depends of the argument<--)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
we want a Bio::Structure::Chain or a list of these

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
STACK: read_pdb.pl:11
-----------------------------------------------------------


And if I use a Chain object I get the error that I told you.

I have try this code:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
my $model = Bio::Structure::Model->new( -id  => '0');

for my $chain ($struc->get_chains) {
        if($chain->id eq "A"){
                $new_entry->add_chain($model,$chain);

                last;
        }
}
$new_entry->add_model($model);
my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_entry);


But I get an empty pdb

HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
stru              
REMARK
1                                                                      
TER       1          A
0                                                      
MASTER                                                                          
END  

I am trying a lot of combinations, but I can't write a single chain into
a file. I don't know what I am doing wrong.

Thanks for helping

regards,
Joan


On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> Joan,
> 
> It looks to me like the first argument to the add_chain() method has  
> to be a Model object, the second is the Chain itself. See Structure/ 
> Entry.pm, for example. However if you're seeing some documentation  
> that says something else then tell us where, it needs to be corrected.
> 
> In Bio::Structure an Entry consists of one or Models, each of which  
> has one or more Chains. This allows you to build macromolecular  
> complexes (an Entry), which could have more than one defined proteins  
> or protein complexes (Models).
> 
> Brian O.
> 
> On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> 
> > Hello,
> >
> > I am trying to do a very easy think but I don't get it. I want to  
> > write
> > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > think
> > that it should work is the next script:
> >
> > use Bio::Structure::IO;
> > use strict;
> >
> > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > =>
> > 'pdb');
> > my $struc = $structio->next_structure;
> >
> > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> >
> > for my $chain ($struc->get_chains) {
> > 	if($chain->id eq "A"){
> > 		$new_entry->chain($chain);
> > 		last;
> > 	}
> > }
> >
> > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > 'pdb');#
> > $out->write_structure($new_entry);
> >
> > it doesn't. I get the next error:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: add_chain: first argument needs to be a Model object ()
> >
> > STACK: Error::throw
> > STACK:
> > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > 368
> > STACK:
> > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:335
> > STACK:
> > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:391
> > STACK:
> > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:304
> > STACK: read_pdb.pl:10
> > -----------------------------------------------------------
> >
> > As far I understand the documentation, the method chain of the object
> > Bio::Structure::Entry requires an as input an object of type Chain.
> >
> > Any solution will be very welcome.
> >
> > best regards,
> > Joan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From fs5 at sanger.ac.uk  Mon Dec 14 07:18:17 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 14 Dec 2009 12:18:17 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi,

Maybe I'm really missing something here but I can't find how to parse a
file that is basically just the Feature Table from an EMBL file, looking
like this:

FT   CDS
join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842)
FT                   /colour=7
FT                   /product="RNA-binding protein, putative"
FT   CDS             213199..214812
FT                   /colour=7
FT                   /product="eukaryotic translation initiation factor
3
FT                   subunit 7, putative"
...[more of the same]

So the file has no header and no actual sequence and it is used simply
to annotate a chromosome in a genome assembly. I've always used GFF for
that purpose but have been given this file now.
BioSeqIO->new(-format=>"EMBL") complains about the missing header and if
I stick in a fake ID line, it warns about the missing sequence and the
fact that the features don't fit on the sequence (of length 0). 
Of course it's not difficult to write my own parser but I'm sure there
must be a BioPerl way of doing that that I have just overlooked. Thanks
for your help.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Mon Dec 14 09:06:54 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Dec 2009 15:06:54 +0100
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>

Hi Frank,

You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12

Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.

It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.


Dave


PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


From eric_donaldson at med.unc.edu  Mon Dec 14 09:22:40 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Mon, 14 Dec 2009 09:22:40 -0500
Subject: [Bioperl-l] problem with install
In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
	<7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
Message-ID: <f750f0a17830d.4b2603e0@med.unc.edu>

Thank you Jason.? I appreciate the help.

Eric

----- Original Message -----
From: Jason Stajich <jason at bioperl.org>
Date: Monday, December 14, 2009 12:10 am
Subject: Re: [Bioperl-l] problem with install
To: eric_donaldson at med.unc.edu
Cc: BioPerl List <bioperl-l at bioperl.org>

> Eric -
> please CC the bioperl list when responding so others can help - 
> I? 
> can't be the only answerer.
> 
> But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ 
> you? 
> would need to make sure that is added to your PERL5LIB.
> There are some help docs on the perl sites I expect on how to 
> get your? 
> PATHs in order.
> 
> Or you can just install via CPAN which will put it in the right 
> path -? 
> there are docs on the bioperl website about installing via CPAN.
> 
> -jason
> On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:
> 
> > Hi Jason,
> >
> > The fink package did not have support for perl 5.10, so I 
> attempted? 
> > to install the perl 5.8.6 package.
> >
> > When I attempted: locate Bio/SearchIO.pm
> > I got: -bash: $: command not found
> >
> > So even though I can find SearchIO.pm in 
> sw/lib/perl5/5.8.8/Bio/ 
> > SearchIO.pm? I cannot access it.? Do I need to use 
> the older version? 
> > of perl?
> >
> > Would it be better to install with CPAN?? If so, can you 
> send me to? 
> > a page that has instructions?
> >
> > Thank you so much!
> >
> > ERic
> >
> >
> > ----- Original Message -----
> > From: Jason Stajich <jason at bioperl.org>
> > Date: Sunday, December 13, 2009 11:10 pm
> > Subject: Re: [Bioperl-l] problem with install
> > To: eric_donaldson at med.unc.edu
> > Cc: BioPerl List <bioperl-l at bioperl.org>
> >
> >> So you installed perl-5.10 or using system perl?? I'm
> >> confused if you
> >> actually installed bioperl.pm or not via fink?
> >>
> >> It seems like since your @INC or $PERL5LIB points to
> >> /sw/lib/perl5
> >> which is one of the dirs it would have installed in, but I don't
> >> think
> >> you actually installed bioperl.
> >>
> >> you can try and do:
> >> $ locate Bio/SearchIO.pm
> >>
> >> We'll see if any of the other osx/fink gurus are on the list
> >> that can
> >> help or you can install it via CPAN I guess.
> >>
> >> -jason
> >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
> >>
> >>>
> >>> I actually tried a different blastparser that uses
> >> BIO::SearchIO and
> >>> got the same message:
> >>>
> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
> >> /sw/lib/perl5/
> >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
> >> /
> >>> Library/Perl/Updates/5.10.0 
> /System/Library/Perl/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/5.10.0
> >> /Library/Perl/5.10.0/
> >>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >> /Network/Library/
> >>> Perl/5.10.0/darwin-thread-multi-2level
> >> /Network/Library/Perl/5.10.0 /
> >>> Network/Library/Perl 
> /System/Library/Perl/Extras/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >> at
> >>> blastparser.new.pl line 8.
> >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
> >>>
> >>> I suspect there is a path problem, but am not savvy enough to
> >> know
> >>> how to fix it.? I am really just a hacker.... I have
> >> several scripts
> >>> that I use regularly and that I know how to modify, but am
> >> lost when
> >>> they don't work...
> >>>
> >>> Thanks for any help,
> >>>
> >>> Eric
> >>>
> >>> ----- Original Message -----
> >>> From: Jason Stajich <jason at bioperl.org>
> >>> Date: Sunday, December 13, 2009 8:24 pm
> >>> Subject: Re: [Bioperl-l] problem with install
> >>> To: eric_donaldson at med.unc.edu
> >>> Cc: bioperl-l at bioperl.org
> >>>
> >>>> Hi Eric -
> >>>>
> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
> >>>> was
> >>>> deprecated several releases ago.
> >>>> It was replaced with Bio::SearchIO
> >>>>
> >>>> -jason
> >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
> >>>> fink.? I
> >>>>> used the
> >>>>>
> >>>>> fink install bioperl.pm-588 as I could not get it to instal
> >>>> using
> >>>>> the perl version 5.10.
> >>>>>
> >>>>> But now I get an error when trying to run a bioperl script.
> >>>>>
> >>>>> Here is the error:
> >>>>>
> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
> >>>> /sw/lib/
> >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
> >>>> /sw/lib/perl5/darwin /
> >>>>> Library/Perl/Updates/5.10.0
> >> /System/Library/Perl/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/5.10.0
> >>>> /Library/Perl/5.10.0/
> >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >>>> /Network/Library/
> >>>>> Perl/5.10.0/darwin-thread-multi-2level
> >>>> /Network/Library/Perl/5.10.0 /
> >>>>> Network/Library/Perl
> >> /System/Library/Perl/Extras/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >>>> at
> >>>>> blastparser.pl line 8.
> >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
> >>>>>
> >>>>>
> >>>>> I am a novice at unix and bioperl so I do not know how
> >>>> to
> >>>>> troubleshoot this, would you please hleo me?
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>>
> >>>>> Eric F. Donaldson, Ph.D.
> >>>>> Research Assistant Professor, Ralph Baric Lab
> >>>>> University of North Carolina
> >>>>> Department of Epidemiology
> >>>>>
> >>>>>
> >>>>>
> >>>> <
> >>>>
> >> eric_donaldson.vcf>_______________________________________________>
> >>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> jason.stajich at gmail.com
> >>>> jason at bioperl.org
> >>>>
> >>>>
> >>>
> >>> Eric F. Donaldson, Ph.D.
> >>> Research Assistant Professor, Ralph Baric Lab
> >>> University of North Carolina
> >>> Department of Epidemiology
> >>>
> >>>
> >>> <eric_donaldson.vcf>
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >>
> >>
> >
> > Eric F. Donaldson, Ph.D.
> > Research Assistant Professor, Ralph Baric Lab
> > University of North Carolina
> > Department of Epidemiology
> >
> >
> > <eric_donaldson.vcf>
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> 
> 

Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From umjsm at leeds.ac.uk  Mon Dec 14 11:58:03 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 16:58:03 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260787172.7359.0.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
	<1260787172.7359.0.camel@limm-pc1254>
Message-ID: <1260809883.7359.15.camel@limm-pc1254>

Hi again,


To extract a pdb chain in a file, I have had to do it adding atom by
atom to a new structure.

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_struct = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $model ($struc->get_models){
	$new_struct->add_model($model);
	for my $chain ($struc->get_chains) {
		$new_struct->add_chain($model,$chain);
		if($chain->id eq "A"){
			foreach my $res ($struc->get_residues($chain)){
				$new_struct->add_residue($chain,$res);
				foreach my $atom  ($struc->get_atoms($res)){
					$new_struct->add_atom($res,$atom);
				}
			}
		}
		last;
	}
	last;
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_struct);

I suppose that there should be a more elegant way to do it.

If someone knows it and can explain it I will be very grateful.

kind regards, 
Joan

On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote:
> Hi Brian,
> 
> I am not calling the method add_chain, I am calling the method chain
> 
> http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6
> 
> and if I don't use as an argument an object of type
> 
> Bio::Structure::Chain
> 
> I get an error like this (-->depends of the argument<--)
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
> we want a Bio::Structure::Chain or a list of these
> 
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
> STACK: read_pdb.pl:11
> -----------------------------------------------------------
> 
> 
> And if I use a Chain object I get the error that I told you.
> 
> I have try this code:
> 
> use Bio::Structure::IO;
> use strict;
> 
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
> 'pdb');
> my $struc = $structio->next_structure;
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> my $model = Bio::Structure::Model->new( -id  => '0');
> 
> for my $chain ($struc->get_chains) {
>         if($chain->id eq "A"){
>                 $new_entry->add_chain($model,$chain);
> 
>                 last;
>         }
> }
> $new_entry->add_model($model);
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');
> $out->write_structure($new_entry);
> 
> 
> But I get an empty pdb
> 
> HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
> stru              
> REMARK
> 1                                                                      
> TER       1          A
> 0                                                      
> MASTER                                                                          
> END  
> 
> I am trying a lot of combinations, but I can't write a single chain into
> a file. I don't know what I am doing wrong.
> 
> Thanks for helping
> 
> regards,
> Joan
> 
> 
> On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> > Joan,
> > 
> > It looks to me like the first argument to the add_chain() method has  
> > to be a Model object, the second is the Chain itself. See Structure/ 
> > Entry.pm, for example. However if you're seeing some documentation  
> > that says something else then tell us where, it needs to be corrected.
> > 
> > In Bio::Structure an Entry consists of one or Models, each of which  
> > has one or more Chains. This allows you to build macromolecular  
> > complexes (an Entry), which could have more than one defined proteins  
> > or protein complexes (Models).
> > 
> > Brian O.
> > 
> > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> > 
> > > Hello,
> > >
> > > I am trying to do a very easy think but I don't get it. I want to  
> > > write
> > > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > > think
> > > that it should work is the next script:
> > >
> > > use Bio::Structure::IO;
> > > use strict;
> > >
> > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > > =>
> > > 'pdb');
> > > my $struc = $structio->next_structure;
> > >
> > > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> > >
> > > for my $chain ($struc->get_chains) {
> > > 	if($chain->id eq "A"){
> > > 		$new_entry->chain($chain);
> > > 		last;
> > > 	}
> > > }
> > >
> > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > > 'pdb');#
> > > $out->write_structure($new_entry);
> > >
> > > it doesn't. I get the next error:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: add_chain: first argument needs to be a Model object ()
> > >
> > > STACK: Error::throw
> > > STACK:
> > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > > 368
> > > STACK:
> > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:335
> > > STACK:
> > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:391
> > > STACK:
> > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:304
> > > STACK: read_pdb.pl:10
> > > -----------------------------------------------------------
> > >
> > > As far I understand the documentation, the method chain of the object
> > > Bio::Structure::Entry requires an as input an object of type Chain.
> > >
> > > Any solution will be very welcome.
> > >
> > > best regards,
> > > Joan
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gowthaman.ramasamy at sbri.org  Mon Dec 14 14:16:32 2009
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 14 Dec 2009 11:16:32 -0800
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
Message-ID: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>


Hi All,
I have a list of GO terms. And would like to pull GO accessions for them.
I can easily do the revere of it using get_term("GO::00000051").

But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".


Thanks very much,
Gowtham


From lsbrath at gmail.com  Mon Dec 14 14:41:39 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Mon, 14 Dec 2009 14:41:39 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>

Hello,

I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
following error message:

Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
/sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
/Library/Perl/5.8.8 /Library/Perl
/Network/Library/Perl/5.8.8/darwin-thread-multi-2level
/Network/Library/Perl/5.8.8 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
at project_example.pl line 4.
BEGIN failed--compilation aborted at project_example.pl line 4.

I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
Any ideas?

MEB


From scott at scottcain.net  Mon Dec 14 14:47:05 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 14 Dec 2009 14:47:05 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com>

Hi Mgavi,

I think Jason may have already started helping, but the question is:
is SeqIO.pm anywhere in those directories?  If not, why not?  If so,
why can't the perl you are using find it?  Do you have more than one
instance of perl on your machine (fairly likely if you are using a
fink-installed BioPerl)?  When you execute your script, which perl are
you using?

Scott


On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite <lsbrath at gmail.com> wrote:
> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From bosborne11 at verizon.net  Mon Dec 14 14:45:35 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 14 Dec 2009 14:45:35 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net>

Mgavi,

So there's a directory called /sw/lib/perl5/Bio? Or is it called  
something else?

Brian O.


On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote:

> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get  
> the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ 
> 5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error  
> message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec 14 16:42:09 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 Dec 2009 13:42:09 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <C56E1117A61A4835B8E794D34A157F5B@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>
	<C56E1117A61A4835B8E794D34A157F5B@jonas>
Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org>

you can read the man page from sean Eddy or use it exactly as I showed  
you
sreformat fasta filename > filename.new

you can also use the 1st example which is a bioperl solution.

-jason
On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote:

> Hi Jason,
> thank you very much for your answer.
> i am sorry to bother u again but i'm afraid i need some help with  
> that because i don't see how to use sreformat?
> i dont get it managed to write a script that works.
>
> thank u again :)
> jonas
>
>
> ----- Original Message ----- From: "Jason Stajich" <jason at bioperl.org>
> To: "Jonas Schaer" <Jonas_Schaer at gmx.de>
> Cc: <bioperl-l at bioperl.org>
> Sent: Tuesday, December 08, 2009 6:44 PM
> Subject: Re: [Bioperl-l] fasta format
>
>
>> you can run
>> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or
>> that is installed when you install the Bioperl scripts)
>> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o  
>> yournewfile.fa
>> # rename it back
>> $ mv yournewfile.fa yourfile.fa
>>
>> or
>> $ sreformat fasta yourfile.fa > yournewfile.fa
>> $ mv yournewfile.fa yourfile.fa
>>
>>
>> -jason
>> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:
>>
>>> Hi there,
>>> I have a little question concerning bioperl. I have
>>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read
>>> in some fasta files. first it worked fine, but now i have some
>>> fastafiles in slightly different format (not all lines have the same
>>> length!).
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Each line of the fasta entry must be the same length except the
>>> last.
>>>   Line above #49 '
>>> ..' is 28 != 101 chars.
>>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/
>>> Fasta.pm:771
>>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: 
>>> 681
>>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
>>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
>>> STACK main::readfasta blast_eval.pm:174
>>> STACK toplevel blast_eval.pm:83
>>> -------------------------------------
>>>
>>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/
>>> site/lib/Bio/
>>> DB/Fasta.pm line 1054.
>>>
>>>
>>> Is there any way to use these fasta files with diffrent length of
>>> lines with this fasta.pm module or will i have to change the format
>>> of my fasta-files(big databases...) ?
>>>
>>> Thanks in advance for any help!
>>>
>>> Regards, Jonas
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>
>
> --------------------------------------------------------------------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date:  
> 12/08/09 07:34:00
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Mon Dec 14 20:23:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 14 Dec 2009 19:23:05 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>

All,

The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:

1) Stockholm Rfam reverses start and end if the strand == -1
          
   chrY/598-1

2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end

   rice-3(+)/16598648-16600199

The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?

chris


From bernd.web at gmail.com  Tue Dec 15 03:37:44 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 15 Dec 2009 09:37:44 +0100
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
	<C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com>

Dear Gowthaman,

A non-BioPerl solution: the Ontology Lookup service at EBI. It also
provides a web service interface.

http://www.ebi.ac.uk/ontology-lookup/

citrulline metabolic process has to be selected from the pull-down
list in the interactive page. This will return the ID (GO:0000052) and
addional info:

definition	The chemical reactions and pathways involving citrulline,
N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins.
preferred name	citrulline metabolic process
exact synonym	citrulline metabolism
subset	Prokaryotic GO subset
xref_definition	ISBN:209853"Oxford Dictionary of Biochemistry and
Molecular Biology"

The webservice is described at
http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do


Regards,
Bernd


On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy
<gowthaman.ramasamy at sbri.org> wrote:
>
> Hi All,
> I have a list of GO terms. And would like to pull GO accessions for them.
> I can easily do the revere of it using get_term("GO::00000051").
>
> But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".
>
>
> Thanks very much,
> Gowtham
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From fs5 at sanger.ac.uk  Tue Dec 15 05:38:40 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 15 Dec 2009 10:38:40 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
	<0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk>

Thanks Dave,
good to know that I haven't overlooked something bleedingly obvious in
Bioperl that already does this :-)
No problem, I have already implemented a simple parser to do it, which
works fine for my files. 
Thanks
Frank


On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote:
> Hi Frank,
> 
> You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12
> 
> Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.
> 
> It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.
> 
> 
> Dave
> 
> 
> PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:
> 
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From rmb32 at cornell.edu  Tue Dec 15 10:09:43 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 15 Dec 2009 07:09:43 -0800
Subject: [Bioperl-l] AGI's fpc stuff:  Bio::Map::Physical, Bio::MapIO::fpc,
	etc
Message-ID: <4B27A6B7.6090709@cornell.edu>

Hi all,

Recently I caught an interesting thing related to making GFF files out
of FPC maps built recently using Bio::MapIO;:fpc.  All of the 
coordinates in the resulting GFF3 and the sizes of the contigs and 
clones seem to be dilated by 4x from where they should be.

This didn't happen with some earlier FPC datasets I ran through these 
modules.

I haven't gone through any of this very thoroughly, but I notice in 
Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my 
$basepair = 4096', and the routine goes on to use $basepair as a sort of 
multiplier for converting the native physical map units into basepairs 
for GFF-style output.

This makes me wonder if the newer FPC datasets coming out require a 
different $basepairs value, maybe 1024?  Are the original authors of 
these modules still around on this list?

Rob

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From tristan.lefebure at gmail.com  Tue Dec 15 12:18:26 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 15 Dec 2009 12:18:26 -0500
Subject: [Bioperl-l] ncurses and bioperl?
Message-ID: <200912151218.26357.tristan.lefebure@gmail.com>

Hello,

(Be careful: the following is a very naive question)

Something that I find myself missing is a simple way to look 
at alignments and trees on remote machines where I don't 
have access to X. Since, 
	(1) one can make wonderful terminal programs like screen 
and emacs by using ncurses, 
	(2) that alignment and tree objects are already well 
handled in bioperl, and 
	(3) that there is a CPAN Curses module; 

doing 1+2+3, may I dream of a curse/bioperl perl program to 
render alignment and trees? I suppose a plain C program 
would be much better, but well I am a biologist...

Thanks,

--Tristan


From jason at bioperl.org  Tue Dec 15 12:50:52 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 15 Dec 2009 09:50:52 -0800
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <AEFA51CB-0070-4A1F-9FE3-DA4810129398@bioperl.org>

not to say this isn't a good idea, but currently for curses I would  
use the treeviewing with retree from PHYLIP
and for short read alignments the samtools tview or Gambit (MarthLab)   
works great or something like ralee for viewing MSA alignments (though  
targeted for RNA editing)
  http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ 
  http://dx.doi.org/10.1093/bioinformatics/bth489

Just that there are prior examples so would be able to learn from them  
if you still wanted to roll your own here.

-jason
On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote:

> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From roy.chaudhuri at gmail.com  Tue Dec 15 12:47:26 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 15 Dec 2009 17:47:26 +0000
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <4B27CBAE.5000303@gmail.com>

Hi Tristan,

Not a Bioperl solution, but retree from the Phylip package displays 
trees in a terminal.

Roy.

On 15/12/2009 17:18, Tristan Lefebure wrote:
> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nml5566 at gmail.com  Tue Dec 15 16:37:30 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 15 Dec 2009 15:37:30 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>

Is the Bio::Ontology::OBOEngine module working or being currently
maintained? I tried following the documentation in the module:

* use Bio::Ontology::OBOEngine;

 my $parser = Bio::Ontology::OBOEngine->new
               ( -file => "gene_ontology.obo" );

 my $engine = $parser->parse();

*But, it throws an error when I run the file saying 'Can't locate object
method "parse" '. Does anyone have any experience getting this module
working; or, is there any alternative bioperl module to extract terms and
relationships out of sequence ontology files?


From hlapp at drycafe.net  Tue Dec 15 17:05:10 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 15 Dec 2009 17:05:10 -0500
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
Message-ID: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>

That shouldn't happen I suppose, but you're not supposed really to use  
the engine directly. Rather it will be used as a backing parser by the  
Bio::OntologyIO parser you choose. Have you tried that route and found  
it not to work?

	-hilmar

On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:

> Is the Bio::Ontology::OBOEngine module working or being currently
> maintained? I tried following the documentation in the module:
>
> * use Bio::Ontology::OBOEngine;
>
> my $parser = Bio::Ontology::OBOEngine->new
>               ( -file => "gene_ontology.obo" );
>
> my $engine = $parser->parse();
>
> *But, it throws an error when I run the file saying 'Can't locate  
> object
> method "parse" '. Does anyone have any experience getting this module
> working; or, is there any alternative bioperl module to extract  
> terms and
> relationships out of sequence ontology files?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Wed Dec 16 04:58:16 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Dec 2009 10:58:16 +0100
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <DB8FB8FF-7DCE-4718-9E17-856F09AE1F46@sbc.su.se>

I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.)

It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse.

Whichever way you go, I think

> a new method that creates this, and deprecate[s] out simple non-stranded NSE

would be great.


Dave


From maj at fortinbras.us  Wed Dec 16 07:51:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 16 Dec 2009 07:51:24 -0500
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>

I'm with Dave; option 1 is cleaner. The only problem might be the automatic 
interpretation of older output as always plus strand, but presumably these would 
have had to record the strandedness explicitly elsewhere, so they would be 
updatable. I'm definitely for making strandedness part of the spec in some way. 
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 14, 2009 8:23 PM
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes


> All,
>
> The current output for NSE format (Name/Start-End) via 
> Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have 
> seen two variations of NSE that incorporate strandedness:
>
> 1) Stockholm Rfam reverses start and end if the strand == -1
>
>   chrY/598-1
>
> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>
>   rice-3(+)/16598648-16600199
>
> The former breaks fewer things within BioPerl, but the latter seems more 
> explicit.  Any preferences?  Do we want a new method that creates this, and 
> deprecate out simple non-stranded NSE?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From tuco at pasteur.fr  Wed Dec 16 09:14:28 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 15:14:28 +0100
Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO
	(Genbank)
Message-ID: <4B28EB44.3080006@pasteur.fr>

Hi,

I've wrote a small Genbank parser few months ago before BioPerl release 
1.6.0.
I tried to use my code once again but now the output of my parser is empty.
It looks like Annotation from seqfeatures is not filled anymore.

Here is the code I used previously:

while(my $seq = $streamer->next_seq()){

     #We only want to retrieve CDS features...
     foreach my $feat (grep { $_->primary_tag() eq 'CDS' } 
$seq->get_SeqFeatures()){
         print $ofh join("#",
                         
$feat->annotation()->get_Annotations('locus_tag'),    # Acc num
                         $feat->annotation()->get_Annotations('gene')
                           ? 
$feat->annotation()->get_Annotations('gene')      # Gene name
                           : 
$feat->annotation()->get_Annotations('locus_tag'),
                         
$feat->annotation()->get_Annotations('product'),      # Description
                        ),"\n";
     }
}

$feat is a Bio::SeqFeature::Generic object

If I print Dumper($feat->annotation()) here is the output :

$VAR1 = bless( {
                  '_typemap' => bless( {
                                         '_type' => {
                                                      'comment' => 
'Bio::Annotation::Comment',
                                                      'reference' => 
'Bio::Annotation::Reference',
                                                      'dblink' => 
'Bio::Annotation::DBLink'
                                                    }
                                       }, 'Bio::Annotation::TypeManager' ),
                  '_annotation' => {}
                }, 'Bio::Annotation::Collection' );

Have some changes been made into the way annotation object is populated?

Thanks for any clue and sorry if my question look stupid

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From cjfields at illinois.edu  Wed Dec 16 10:09:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 16 Dec 2009 09:09:56 -0600
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <4B28EB44.3080006@pasteur.fr>
References: <4B28EB44.3080006@pasteur.fr>
Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>

Emmanuel,

The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):

for my $feat_object ($seq_object->get_SeqFeatures) {
    print "primary tag: ", $feat_object->primary_tag, "\n";
    for my $tag ($feat_object->get_all_tags) {
        print "  tag: ", $tag, "\n";
        for my $value ($feat_object->get_tag_values($tag)) {
            print "    value: ", $value, "\n";
        }   
    }
}

You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.

chris

On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:

> Hi,
> 
> I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0.
> I tried to use my code once again but now the output of my parser is empty.
> It looks like Annotation from seqfeatures is not filled anymore.
> 
> Here is the code I used previously:
> 
> while(my $seq = $streamer->next_seq()){
> 
>    #We only want to retrieve CDS features...
>    foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){
>        print $ofh join("#",
>                        $feat->annotation()->get_Annotations('locus_tag'),    # Acc num
>                        $feat->annotation()->get_Annotations('gene')
>                          ? $feat->annotation()->get_Annotations('gene')      # Gene name
>                          : $feat->annotation()->get_Annotations('locus_tag'),
>                        $feat->annotation()->get_Annotations('product'),      # Description
>                       ),"\n";
>    }
> }
> 
> $feat is a Bio::SeqFeature::Generic object
> 
> If I print Dumper($feat->annotation()) here is the output :
> 
> $VAR1 = bless( {
>                 '_typemap' => bless( {
>                                        '_type' => {
>                                                     'comment' => 'Bio::Annotation::Comment',
>                                                     'reference' => 'Bio::Annotation::Reference',
>                                                     'dblink' => 'Bio::Annotation::DBLink'
>                                                   }
>                                      }, 'Bio::Annotation::TypeManager' ),
>                 '_annotation' => {}
>               }, 'Bio::Annotation::Collection' );
> 
> Have some changes been made into the way annotation object is populated?
> 
> Thanks for any clue and sorry if my question look stupid
> 
> Regards
> 
> Emmanuel
> 
> -- 
> -------------------------
> Emmanuel Quevillon
> Biological Software and Databases Group
> Institut Pasteur
> +33 1 44 38 95 98
> tuco at_ pasteur dot fr
> -------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tuco at pasteur.fr  Wed Dec 16 10:37:45 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 16:37:45 +0100
Subject: [Bioperl-l] Data missing into Annotation object
 using	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <4B28FEC9.1080509@pasteur.fr>

On 12/16/2009 04:09 PM, Chris Fields wrote:
> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>      print "primary tag: ", $feat_object->primary_tag, "\n";
>      for my $tag ($feat_object->get_all_tags) {
>          print "  tag: ", $tag, "\n";
>          for my $value ($feat_object->get_tag_values($tag)) {
>              print "    value: ", $value, "\n";
>          }
>      }
> }
>
> You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
>    
Hi Chris

Thanks for the infos.
I indeed revert back to using $feat->get_tag_values() and it works as 
previously.
For my small problem I can keep this solution which far adapted for my 
problem.

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From sung at bio.cc  Wed Dec 16 12:55:16 2009
From: sung at bio.cc (Sungsam Gong)
Date: Wed, 16 Dec 2009 17:55:16 +0000
Subject: [Bioperl-l] pdb.pm and annotations
Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>

Hi,

Wanted to get pubmed identifier from a PDB file using Bio::Structure,
so hacked the code.
Knew that Bio::Structure::IO::pdb.pm get relevant info from either
'JRNL' or 'REMARK 1'.
However could not see any actual code parsing 'PMID'.

>From pdb.pm, what I see:

sub _read_PDB_jrnl {
...
           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
...
}

sub _read_PDB_remark_1 {
...
               $auth = $self->_concatenate_lines($auth,$rol) if
($subr eq "AUTH");
               $titl = $self->_concatenate_lines($titl,$rol) if
($subr eq "TITL");
               $edit = $self->_concatenate_lines($edit,$rol) if
($subr eq "EDIT");
               $ref  = $self->_concatenate_lines($ref ,$rol) if
($subr eq "REF");
               $publ = $self->_concatenate_lines($publ,$rol) if
($subr eq "PUBL");
               $refn = $self->_concatenate_lines($refn,$rol) if
($subr eq "REFN");
...
}

>From my script, I did:

($struc->annotation->get_Annotations('reference'))[0]->authors
($struc->annotation->get_Annotations('reference'))[0]->title

or

my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
for my $key (keys %{$hash_ref}) {
   print $key,": ",$hash_ref->{$key},"\n";
}

Any plan to include a code chopping 'PMID' out?
Or did I miss something?

Cheers,
Sung


From nml5566 at gmail.com  Wed Dec 16 14:42:57 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Wed, 16 Dec 2009 13:42:57 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
	<F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com>

Actually, yes I did find that and it works very well. Now I'm wondering, is
it possible to search for similar terms using a string instead of a
Bio::Ontology term object? For examle, I'd like to search for the synonym:
"transcription start site" and have it return all similar terms. But, it
throws an error if I pass in a simple query like that.

-Nathan

On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp <hlapp at drycafe.net> wrote:

> That shouldn't happen I suppose, but you're not supposed really to use the
> engine directly. Rather it will be used as a backing parser by the
> Bio::OntologyIO parser you choose. Have you tried that route and found it
> not to work?
>
>        -hilmar
>
>
> On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:
>
>  Is the Bio::Ontology::OBOEngine module working or being currently
>> maintained? I tried following the documentation in the module:
>>
>> * use Bio::Ontology::OBOEngine;
>>
>> my $parser = Bio::Ontology::OBOEngine->new
>>              ( -file => "gene_ontology.obo" );
>>
>> my $engine = $parser->parse();
>>
>> *But, it throws an error when I run the file saying 'Can't locate object
>> method "parse" '. Does anyone have any experience getting this module
>> working; or, is there any alternative bioperl module to extract terms and
>> relationships out of sequence ontology files?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
>


From cjfields1 at gmail.com  Wed Dec 16 19:53:50 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
Message-ID: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>

Howdy from Google Groups


From cjfields1 at gmail.com  Wed Dec 16 20:01:38 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>

I would like to announce (with the tremendous help of Hilmar Lapp) the
creation of a mirror for the BioPerl mail list, if the last post
didn't already give it away.

http://groups.google.com/group/bioperl-l

One can join the group and submit posts via the Google Groups web
interface or via email.  Have fun!

chris


From ocarnorsk138 at gmail.com  Wed Dec 16 20:12:21 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
In-Reply-To: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
References: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com>

testing back from google group!

On Dec 16, 9:53?pm, Chris Fields <cjfiel... at gmail.com> wrote:
> Howdy from Google Groups
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Thu Dec 17 05:50:23 2009
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
Message-ID: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>

On Dec 17, 1:01?am, Chris Fields <cjfiel... at gmail.com> wrote:
> I would like to announce (with the tremendous help of Hilmar Lapp) the
> creation of a mirror for the BioPerl mail list, if the last post
> didn't already give it away.
>
> http://groups.google.com/group/bioperl-l
>
> One can join the group and submit posts via the Google Groups web
> interface or via email. ?Have fun!
>
> chris

Sounds particularly good in the long run (once there is enough of
an archive on Google Groups to make searching there useful).

Does this mean a Google Groups user doesn't have to be subscribed
to the mailing list to post (since the mailing list normally only
allows subscribers to post)?

Peter


From David.Messina at sbc.su.se  Thu Dec 17 07:25:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 17 Dec 2009 13:25:49 +0100
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>

Very nice, Chris and Hilmar! That'll be great.


> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post (since the mailing list normally only
> allows subscribers to post)?


I think that's right. From the Google groups page:

> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.


Dave


From cjfields at illinois.edu  Thu Dec 17 08:21:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 07:21:46 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu>


On Dec 17, 2009, at 6:25 AM, Dave Messina wrote:

> Very nice, Chris and Hilmar! That'll be great.
> 
> 
> 
>> Does this mean a Google Groups user doesn't have to be subscribed
>> to the mailing list to post (since the mailing list normally only
>> allows subscribers to post)?
> 
> 
> I think that's right. From the Google groups page:
> 
>> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.
> 
> 
> 
> 
> Dave

It is moderated by user to deal with spam.  Hilmar's already a manager/co-owner, and either of us can add more as needed.

chris


From hlapp at drycafe.net  Thu Dec 17 09:52:33 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 17 Dec 2009 09:52:33 -0500
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>


On Dec 17, 2009, at 5:50 AM, Peter wrote:

> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post


Yes. They can post through the Google Groups web interface.

The email address for mirrored groups is the one of the list being  
mirrored though, bioperl-l at bioperl.org in this case, and so in order  
to post by email you still have to be subscribed at the bioperl-l  
list. At least that's what the docs at Google say.

I haven't tried yet posting to the group at the bioperl-l at  
googlegroups dot com email under an email address that isn't  
subscribed to bioperl-l at bioperl dot org. Maybe it actually would  
work, contrary to docs.

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From jay at jays.net  Thu Dec 17 12:05:24 2009
From: jay at jays.net (Jay Hannah)
Date: Thu, 17 Dec 2009 11:05:24 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net>

On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote:
> I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs.

In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. 

Here is the configuration set I recommend:

   http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png

Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so.

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From robert.bradbury at gmail.com  Thu Dec 17 14:42:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 14:42:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
Message-ID: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>

Just to close out the issue of bioperl forking (in particular accesses to
external databases through get_sequence) which involves individual database
sub-modules and not collecting its children.

As it turns out the code does do an explicit fork, it looks like so the
child process can read from the database while the parent process
manipulates the data as it becomes available.  Now, one could argue that a
threaded model might be better since now threads are fairly standard OS
tools in current environments.

But I couldn't find any functions which actually wait for the forked process
(presumably because they are created for "future" use).  But nor is there
any indication in the pages I've found in most of the documentation (which
is spread across the web) or Wiki that explain that "creating child
processes" is how these functions work and one *needs* to collect those
children after each use or else zombie processes will accumulate, which on
"reasonable" systems with per-user process limits will create problems for
proper program functioning.  Nor (it would appear) does the parent process
setup a SIGCHLD "catcher" which could collect the processes once they exit
(which I expect in the case of "get_sequence" would be after closing of the
socket which actually fetched the sequence from Genbank.

It can be resolved easily enough by adding a call after each use of these
functions:
   $kid = waitpid(-1, WNOHANG);
But typically, as a programmer, I should not be responsible for having to
clean up the leftovers of library calls (unless said cleanup requirements
are clearly documented).


But to a "newbie" using the functions, coming from a functional background
(C), not an OO background (which at least I would tend to view as a wart on
the otherwise robust Perl language), there are two problems
1. The lack of documentation and examples explaining how the functions work
and how they must be handled at a higher level (by executing explicit wait
system calls).
2. The lack of code in the BioPerl functions to deal with the forked
processes which they create.  Functional programmers have a perspective --
if you create it -- you have to clean it up.  It would appear that in the
transition to OO programming (or perhaps simply for expediency) that detail
was left out of both (either/and) the documentation and the code.  From this
standpoint one could view garbage collectors as being fundamentally evil --
because they gloss over the fact that programmers should know what they are
doing and when they are doing it.

So, everywhere in the documentation where there is a get_sequence call (or
anything which accesses an external database which causes a fork to occur)
there should be a modification as I have outlined above -- or else the code
should be corrected so orphaned children are always collected and not
allowed to accumulate.


From robert.bradbury at gmail.com  Thu Dec 17 15:23:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 15:23:38 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>

Oh, yes, in case it was not clear, the fork calls which fails is in
DB/WebDBSeqI.pm: line 722
     defined(my $pid = fork)
          or $self->throw("'Couldn't fork: $!");

And of course that is because Linux has reached the process limits for the
user (due to accumulated background processes which are uncollected).

And they could be resolved by simply executing a simple waitpid call for
prior orphaned children before forking [1] But such a succinct solution
would violate "functional" programming rules -- clean up what you create --
instead they would tend to fall into the OO camp -- "Oh don't worry the
garbage collector will take care of it".  Green programming is a little less
cavalier.

Robert

1. IMO, a very very real problem with programming today is that there is no
connection between programmers and the cost of their programs.  How many
programmers know the instruction cycle time of their computers, what does an
instruction cost in terms of W consumed, W wasted (heat generation),
fruitless scanning over uncollected zombie processes, etc.  It may be that
only that programmers who grew up in the era when CPU cycles were expensive
(300 ns/cycle) who know what each instruction required in terms of cycles
consider these perspectives.  Now things (cpu use, processor use, etc) tend
to be swept under the rug and it appears that that is the case with the
standard implementation of bioper.  The documentation does not clearly state
that additional sub-processes may be created and need to be collected.  You
are providing a utility that only works "this much".  And guess what -- I
happen to have run into the "this".


From cjfields at illinois.edu  Thu Dec 17 15:25:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:25:56 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <BFDD2A52-FB3D-4CC4-A5BF-C53A3DAC9C41@illinois.edu>

Robert,

I have previously outlined specifically why you are seeing the fork issue, and a possible solution.  IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast.  Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank.  Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime.

Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X.  We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution.

My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'.  What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so.  The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect).  Please keep that in mind.

chris

On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote:

> Just to close out the issue of bioperl forking (in particular accesses to
> external databases through get_sequence) which involves individual database
> sub-modules and not collecting its children.
> 
> As it turns out the code does do an explicit fork, it looks like so the
> child process can read from the database while the parent process
> manipulates the data as it becomes available.  Now, one could argue that a
> threaded model might be better since now threads are fairly standard OS
> tools in current environments.
> 
> But I couldn't find any functions which actually wait for the forked process
> (presumably because they are created for "future" use).  But nor is there
> any indication in the pages I've found in most of the documentation (which
> is spread across the web) or Wiki that explain that "creating child
> processes" is how these functions work and one *needs* to collect those
> children after each use or else zombie processes will accumulate, which on
> "reasonable" systems with per-user process limits will create problems for
> proper program functioning.  Nor (it would appear) does the parent process
> setup a SIGCHLD "catcher" which could collect the processes once they exit
> (which I expect in the case of "get_sequence" would be after closing of the
> socket which actually fetched the sequence from Genbank.
> 
> It can be resolved easily enough by adding a call after each use of these
> functions:
>   $kid = waitpid(-1, WNOHANG);
> But typically, as a programmer, I should not be responsible for having to
> clean up the leftovers of library calls (unless said cleanup requirements
> are clearly documented).
> 
> 
> But to a "newbie" using the functions, coming from a functional background
> (C), not an OO background (which at least I would tend to view as a wart on
> the otherwise robust Perl language), there are two problems
> 1. The lack of documentation and examples explaining how the functions work
> and how they must be handled at a higher level (by executing explicit wait
> system calls).
> 2. The lack of code in the BioPerl functions to deal with the forked
> processes which they create.  Functional programmers have a perspective --
> if you create it -- you have to clean it up.  It would appear that in the
> transition to OO programming (or perhaps simply for expediency) that detail
> was left out of both (either/and) the documentation and the code.  From this
> standpoint one could view garbage collectors as being fundamentally evil --
> because they gloss over the fact that programmers should know what they are
> doing and when they are doing it.
> 
> So, everywhere in the documentation where there is a get_sequence call (or
> anything which accesses an external database which causes a fork to occur)
> there should be a modification as I have outlined above -- or else the code
> should be corrected so orphaned children are always collected and not
> allowed to accumulate.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Dec 17 15:29:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:29:10 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
	<deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
Message-ID: <FF6F8AAD-FBBE-4FAD-BB88-59A779CC7131@illinois.edu>

On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote:

> Oh, yes, in case it was not clear, the fork calls which fails is in
> DB/WebDBSeqI.pm: line 722
>     defined(my $pid = fork)
>          or $self->throw("'Couldn't fork: $!");

Okay, that's a bit more helpful.

> And of course that is because Linux has reached the process limits for the
> user (due to accumulated background processes which are uncollected).

Right, but again, we need to check this in a cross-platform compatible way.

> And they could be resolved by simply executing a simple waitpid call for
> prior orphaned children before forking [1] But such a succinct solution
> would violate "functional" programming rules -- clean up what you create --
> instead they would tend to fall into the OO camp -- "Oh don't worry the
> garbage collector will take care of it".  Green programming is a little less
> cavalier.
> 
> Robert
> 
> 1. IMO, a very very real problem with programming today is that there is no
> connection between programmers and the cost of their programs.  How many
> programmers know the instruction cycle time of their computers, what does an
> instruction cost in terms of W consumed, W wasted (heat generation),
> fruitless scanning over uncollected zombie processes, etc.  It may be that
> only that programmers who grew up in the era when CPU cycles were expensive
> (300 ns/cycle) who know what each instruction required in terms of cycles
> consider these perspectives.  Now things (cpu use, processor use, etc) tend
> to be swept under the rug and it appears that that is the case with the
> standard implementation of bioper.  The documentation does not clearly state
> that additional sub-processes may be created and need to be collected.  You
> are providing a utility that only works "this much".  And guess what -- I
> happen to have run into the "this".

Um, yeah.  Okay.

chris


From robfsouza at gmail.com  Fri Dec 18 13:07:34 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Fri, 18 Dec 2009 13:07:34 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
Message-ID: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>

Hi,

I've been dealing with an apparent bug in the output of NCBI's BLAST
programs (blastall, blastpgp) which sometimes produces output like the
one below.
I think I've managed to produce a work around for Bioperl blast.pm
parser and would like to contribute it to Bioperl.
The fix is based on blast.pm from the CVS tree (downloaded some months
ago...) and is attached to this message.
Best,
Robson

PS: what happened to the bioperl-bugs mailing list? It does not seem
to be working...

>gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
? ? ? ? ? hypothetical protein [Nasonia vitripennis]
? ? ? ? ?Length = 1774

?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust.
?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)

Query: 0 ? -

Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654
? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? +
Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376

Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++
Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432

Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF
Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491

Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++
Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548

Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L
Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602

Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E
Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661

Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E
Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blast_patched.pm
Type: application/octet-stream
Size: 91820 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091218/3771d91c/attachment-0002.obj>

From cjfields at illinois.edu  Fri Dec 18 13:33:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 18 Dec 2009 12:33:44 -0600
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <DC79216C-9DD8-47AE-876F-7BBAEC6C43CB@illinois.edu>

Robson, 

Any chance you could check this against SVN?  We haven't used the CVS tree for a few years (had a number of releases along the way as well).

Not sure about bioperl-bugs, we have bugzilla still running though:

http://bugzilla.open-bio.org/

chris


On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote:

> Hi,
> 
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson
> 
> PS: what happened to the bioperl-bugs mailing list? It does not seem
> to be working...
> 
>> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
>           hypothetical protein [Nasonia vitripennis]
>          Length = 1774
> 
>  Score = 75.9 bits (185), Expect = 1e-11,   Method: Compositional matrix adjust.
>  Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)
> 
> Query: 0   -
> 
> Sbjct: 328 P                                                            328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG             654
>            P PP +   + P       KTK+      K+P  K         +
> Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA             376
> 
> Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
>           ++  N  +    W  +     +++  +   N    NN       D   +E    PT ++
> Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432
> 
> Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
>           LD K S  + + L   + +  +I + + D    ++  + +  L  + PE D+ + ++SF
> Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491
> 
> Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
>              DG   +L   +K F  +  +P  K R      +  F  ++  +EP I S+  A +++
> Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548
> 
> Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
>           +  KSLQ ++ ++++  NFLN      +   G KL+ L KL +I++    N+  MN L
> Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602
> 
> Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
>           ++  + ++   +LL   +  +  +  ++  + +L  E   L+  +K I+++++    E
> Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661
> 
> Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
>                  +Q+ +F Q A+ EM ++ +  E+L+ + + +A+FF E
> Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
> <blast_patched.pm>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Fri Dec 18 18:00:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 18 Dec 2009 23:00:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>

On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
> Hi,
>
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson

Do you have a complete example of this kind of funny output?
This problem has also been reported with blastpgp for the
Biopython parser. I'd love an example for our unit tests
(probably worth doing in BioPerl too). Could you upload a
test case here?:

http://bugzilla.open-bio.org/show_bug.cgi?id=2927

Thanks!

Peter @ Biopython


From biopython at maubp.freeserve.co.uk  Sat Dec 19 06:19:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 19 Dec 2009 11:19:53 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Thank you,

Peter


From maj at fortinbras.us  Sat Dec 19 14:52:45 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 19 Dec 2009 14:52:45 -0500
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
Message-ID: <F7E9AD08646A44D3AB29A4504A725095@NewLife>

Hi All, 

Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
is at beta in the bioperl-run trunk. It wraps all the programs of the 
NCBI's new blast+-2.2.22 suite 
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
and integrates them, allowing you to create, mask, and query 
databases from within a single factory object. See the HOWTO
http://www.bioperl.org/wiki/HOWTO:BlastPlus
for the usual usage and implementation details.

Happy coding--
MAJ 


From David.Messina at sbc.su.se  Sat Dec 19 15:34:10 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 19 Dec 2009 21:34:10 +0100
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se>

Sweet! Thanks, Mark.


Dave


From cjfields at illinois.edu  Sat Dec 19 17:44:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 16:44:46 -0600
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu>

Very nice!  We'll definitely give it a try here (along with the requisite feedback, of course).

chris

On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote:

> Hi All, 
> 
> Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
> is at beta in the bioperl-run trunk. It wraps all the programs of the 
> NCBI's new blast+-2.2.22 suite 
> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> and integrates them, allowing you to create, mask, and query 
> databases from within a single factory object. See the HOWTO
> http://www.bioperl.org/wiki/HOWTO:BlastPlus
> for the usual usage and implementation details.
> 
> Happy coding--
> MAJ 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Sat Dec 19 23:59:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 22:59:38 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
	<6723123C0ABD447190639AE1F5D1A6A7@NewLife>
Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu>

I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1).  

I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else).

chris

On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote:

> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Monday, December 14, 2009 8:23 PM
> Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
> 
> 
>> All,
>> 
>> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:
>> 
>> 1) Stockholm Rfam reverses start and end if the strand == -1
>> 
>>  chrY/598-1
>> 
>> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>> 
>>  rice-3(+)/16598648-16600199
>> 
>> The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.osimo at gmail.com  Sun Dec 20 13:19:37 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Sun, 20 Dec 2009 19:19:37 +0100
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>

Hello everyone,
I have a very particular problem: I'd like to draw in a single track
different SNPs with a glyph that allows me to see graphically their
importance.
For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
first depicted small, and the last one big, with the ones in between with
according sizes.
I'd be satisfied also with a color gradient.
What I cannot do is to set the option -height , for example, instead than in
the add_track section, in the Bio::SeqFeature::Generic->new that I use for
each of my objects.
If I set it in the add_track section, all the glyphs are then of the same
size (or color).
If, otherwise, I add a different track for each object, my picture becomes
too big.

Please, help!
Thanks
Emanuele


From ajmackey at gmail.com  Sun Dec 20 13:41:14 2009
From: ajmackey at gmail.com (Aaron Mackey)
Date: Sun, 20 Dec 2009 13:41:14 -0500
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com>

You can set the height as a callback sub, rather than a constant -- the
callback will get passed the feature about to be drawn, from which you can
calculate the "importance", and return the desired height, dynamically.

-Aaron

On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo <e.osimo at gmail.com> wrote:

> Hello everyone,
> I have a very particular problem: I'd like to draw in a single track
> different SNPs with a glyph that allows me to see graphically their
> importance.
> For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
> first depicted small, and the last one big, with the ones in between with
> according sizes.
> I'd be satisfied also with a color gradient.
> What I cannot do is to set the option -height , for example, instead than
> in
> the add_track section, in the Bio::SeqFeature::Generic->new that I use for
> each of my objects.
> If I set it in the add_track section, all the glyphs are then of the same
> size (or color).
> If, otherwise, I add a different track for each object, my picture becomes
> too big.
>
> Please, help!
> Thanks
> Emanuele
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From robfsouza at gmail.com  Sat Dec 19 06:06:16 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 19 Dec 2009 06:06:16 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
Message-ID: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>

Hi Peter,

I just upload my example. I also reported this bug to the NCBI
developers and I hope they can fix it, since it is easy to reproduce.
I just forgot to mention the blastpgp version: 2.2.18
Best,
Robson

On Fri, Dec 18, 2009 at 6:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
> <robfsouza at gmail.com> wrote:
>> Hi,
>>
>> I've been dealing with an apparent bug in the output of NCBI's BLAST
>> programs (blastall, blastpgp) which sometimes produces output like the
>> one below.
>> I think I've managed to produce a work around for Bioperl blast.pm
>> parser and would like to contribute it to Bioperl.
>> The fix is based on blast.pm from the CVS tree (downloaded some months
>> ago...) and is attached to this message.
>> Best,
>> Robson
>
> Do you have a complete example of this kind of funny output?
> This problem has also been reported with blastpgp for the
> Biopython parser. I'd love an example for our unit tests
> (probably worth doing in BioPerl too). Could you upload a
> test case here?:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2927
>
> Thanks!
>
> Peter @ Biopython
>


From biopython at maubp.freeserve.co.uk  Mon Dec 21 10:27:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 15:27:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Hi again Robson,

Having a reproducible example to investigate this issue is
incredibly helpful - thank you!

I've been looking at the output, and while I can make sense of
it "by hand", it would be very tricky to try and parse as a special
case. It really does look like a bug in BLAST to me. The alignment
includes an initial pair, a leading gap in the query (with a coordinate
of zero), plus a residue from the match sequence (with a sensible
coordinate). The alignment statistics include this (extra) pair in
the alignment length.

You said you were using blastpgp version 2.2.18, so I tried this
with the latest (final?) version of the "legacy" BLAST suite,
blastpgp 2.2.22, which I already had installed. It looks like my
copy of NR is more recent (bigger), but the same odd output
was produced:

blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000

I also tried what I think would be the equivalent command line
on the new BLAST+ suite, using psiblast 2.2.22+ like this:

psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast
-num_threads 8 -parse_deflines -num_alignments 10000

This was much faster, and seems to output sensible alignments.

I might therefore expect the NCBI so say "yes, this is a bug in
the old blastpgp tool, just use the new psiblast tool instead".
However,  fingers crossed they will do another maintenance
release of the "legacy" BLAST suite and fix this in blastpgp.

Have you had any reply from the NCBI? Admittedly it is almost
Christmas/New Year so we may not expect an answer until Jan.

Peter


From maj at fortinbras.us  Mon Dec 21 13:52:01 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 13:52:01 -0500
Subject: [Bioperl-l] test fail
Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife>

fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)

t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
#          got: '1..4'
#     expected: 'complement(5..8)'

t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
#          got: 'complement(5..8)'
#     expected: '1..4'
# Looks like you failed 2 tests of 51.

MAJ


From cjfields at illinois.edu  Mon Dec 21 14:20:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 13:20:32 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
Message-ID: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>

Saw that from the other day (LocatableSeq commit).  I'll check it out.

chris

On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:

> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
> 
> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
> #          got: '1..4'
> #     expected: 'complement(5..8)'
> 
> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
> #          got: 'complement(5..8)'
> #     expected: '1..4'
> # Looks like you failed 2 tests of 51.
> 
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Mon Dec 21 15:02:20 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 21 Dec 2009 15:02:20 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>

Hi All,

Today it was pointed out to me that the Bio::Graphics documentation
links on the BioPerl wiki are broken, no doubt because Bio::Graphics
is no longer part of bioperl-core (is that how it should be referred
to?).  Anyway, the question is: what is the right way to rectify this
problem?  Since other things may get broken out in the future, I
suppose we should get some sort of standard established.  Can a
release of Bio::Graphics be placed somewhere on the BioPerl wiki
server to be processed?

Thanks,
Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Mon Dec 21 15:22:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 14:22:39 -0600
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>

We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links.  Shouldn't be too hard to do.

chris

On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:

> Hi All,
> 
> Today it was pointed out to me that the Bio::Graphics documentation
> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
> is no longer part of bioperl-core (is that how it should be referred
> to?).  Anyway, the question is: what is the right way to rectify this
> problem?  Since other things may get broken out in the future, I
> suppose we should get some sort of standard established.  Can a
> release of Bio::Graphics be placed somewhere on the BioPerl wiki
> server to be processed?
> 
> Thanks,
> Scott
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Dec 21 16:12:45 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 15:12:45 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
	<E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
Message-ID: <A396F39A-76BC-44B4-8302-4C622257E6ED@illinois.edu>

T'was a bad test call.  I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly.

chris

On Dec 21, 2009, at 1:20 PM, Chris Fields wrote:

> Saw that from the other day (LocatableSeq commit).  I'll check it out.
> 
> chris
> 
> On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:
> 
>> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
>> 
>> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
>> #          got: '1..4'
>> #     expected: 'complement(5..8)'
>> 
>> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
>> #          got: 'complement(5..8)'
>> #     expected: '1..4'
>> # Looks like you failed 2 tests of 51.
>> 
>> MAJ
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Dec 21 16:27:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:27:25 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife>

I've modified Template:Doclink ; if you now do

{{Doclink|Bio::Graphics|cpan}}

you'll get a page with only the cpan link.

{{Doclink|Bio::SeqIO}}

etc. works as usual.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Dec 21 16:34:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:34:40 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife>

Also, applied the new Doclink to Bio::Graphics on wiki.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Dec 21 21:51:32 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 21:51:32 -0500
Subject: [Bioperl-l] pdb.pm and annotations
In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife>

Hi Sung--

We didn't plan it, but we added it anyway: see revision 16559 of 
bioperl-live/trunk.
You can then do
$pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed;
and even
$doi = ($struct->annotation->get_Annotations('reference'))[0]->doi;

Thanks for the heads-up!
cheers,
MAJ
----- Original Message ----- 
From: "Sungsam Gong" <sung at bio.cc>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 16, 2009 12:55 PM
Subject: [Bioperl-l] pdb.pm and annotations


> Hi,
>
> Wanted to get pubmed identifier from a PDB file using Bio::Structure,
> so hacked the code.
> Knew that Bio::Structure::IO::pdb.pm get relevant info from either
> 'JRNL' or 'REMARK 1'.
> However could not see any actual code parsing 'PMID'.
>
>>From pdb.pm, what I see:
>
> sub _read_PDB_jrnl {
> ...
>           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
>           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
>           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
>           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
>           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
>           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
> ...
> }
>
> sub _read_PDB_remark_1 {
> ...
>               $auth = $self->_concatenate_lines($auth,$rol) if
> ($subr eq "AUTH");
>               $titl = $self->_concatenate_lines($titl,$rol) if
> ($subr eq "TITL");
>               $edit = $self->_concatenate_lines($edit,$rol) if
> ($subr eq "EDIT");
>               $ref  = $self->_concatenate_lines($ref ,$rol) if
> ($subr eq "REF");
>               $publ = $self->_concatenate_lines($publ,$rol) if
> ($subr eq "PUBL");
>               $refn = $self->_concatenate_lines($refn,$rol) if
> ($subr eq "REFN");
> ...
> }
>
>>From my script, I did:
>
> ($struc->annotation->get_Annotations('reference'))[0]->authors
> ($struc->annotation->get_Annotations('reference'))[0]->title
>
> or
>
> my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
> for my $key (keys %{$hash_ref}) {
>   print $key,": ",$hash_ref->{$key},"\n";
> }
>
> Any plan to include a code chopping 'PMID' out?
> Or did I miss something?
>
> Cheers,
> Sung
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From dan.kortschak at adelaide.edu.au  Mon Dec 21 22:24:04 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 13:54:04 +1030
Subject: [Bioperl-l] call for help and comments on module
Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>

Hi,

I've been working on a Bio::Tools::Run module to handle the bowtie rapid
alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
bioperl-run tree).

I have 90% of what I want included in the module and would like some
advice from more experienced bioperlers. Feedback on approach is also
welcomed (this is my first significant wrapper, and after a long gap
from writing module, so I am rusty). The module has ended up being
significantly more complicated than I had hoped.

There are a few issues I'm having, so I apologise for the list:

     1. Informal tests run correctly (outside the t/ tree and Test
        harness), but formal Test harness tests fail for reasons I
        cannot understand. (The module is still lacking a lot of tests,
        but since things were failing in the harness I have placed them
        as a lower priority and have been working to my micro-script
        tests - yes, bad form.
     2. I am having a big problem with IPC::Run for one of the
        executables (the module can call 5 different excutables for 7
        commands), bowtie-maptool (module command 'map'). All the other
        commands tested (this excludes bowtie-maqconvert [convert
        command]) work fine, but maptool fails with an illegal seek -
        presumably due to the redirection handling? I have no idea how
        to resolve this, so help would be greatly appreciated (a small
        script that demonstrates the use that results in the failure is
        below).

There will be provision for returning a Bio::Assembly::IO object through
samtools in the finished module, but currently the
Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.

Thanks for any help.
Dan


#!/usr/bin/perl

use strict;
use warnings;

use Bio::Tools::Run::Bowtie;

# These files are in the bioperl-run t/data/ tree
my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';

my $bowtiefac = Bio::Tools::Run::Bowtie->new(
	-command             => 'single',
	-max_seed_mismatches => 2,
	-seed_length         => 28,
	-max_qual_mismatch   => 70,
	-sam_format          => 0
	);

my $align = $bowtiefac->run($rdq,$refseq); # this runs fine

my $bowtiemap = Bio::Tools::Run::Bowtie->new(
	-command             => 'map'
	);

my $map = $bowtiemap->run($align); # throws Illegal seek

print "$map\n";

open (IN,$map);
	my $lines =(my @lines)= <IN>;
	print @lines;
	print "\n\n$lines\n";
close IN;


From maj at fortinbras.us  Tue Dec 22 00:19:35 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 22 Dec 2009 00:19:35 -0500
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <F7513FBADF944B51823A5F22FFA85911@NewLife>

Hey Dan, 
It looks like if the outfile isn't specified on the commandline for
maptool, then the align is written to stdout. So, you could 
try this workaround in in Bowtie/Config.pm:

our %command_files = (
    'single'     => [qw( ind seq #out )],
    'paired'     => [qw( ind seq seq2 #out )],
    'crossbow'   => [qw( ind seq #out )],
    'build'      => [qw( ref out )],
    'inspect'    => [qw( ind >#out )],
    'convert'    => [qw( bwt out bfa )],
-    'map'        => [qw( bwt #out )]
+    'map'        => [qw( bwt >#out )]
    );

which should be transparent to the user. If this works, then
there is probably something funky going on with IPC::Run
+ maptool; if it doesn't, then the funkiness is prob. in my code.

I notice, however, that both bowtie-maptool and bowtie-maqconvert
have been removed from the 0.12.0-beta release 
(http://bowtie-bio.sourceforge.net/index.shtml)...

cheers MAJ

----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 10:24 PM
Subject: [Bioperl-l] call for help and comments on module


> Hi,
> 
> I've been working on a Bio::Tools::Run module to handle the bowtie rapid
> alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
> bioperl-run tree).
> 
> I have 90% of what I want included in the module and would like some
> advice from more experienced bioperlers. Feedback on approach is also
> welcomed (this is my first significant wrapper, and after a long gap
> from writing module, so I am rusty). The module has ended up being
> significantly more complicated than I had hoped.
> 
> There are a few issues I'm having, so I apologise for the list:
> 
>     1. Informal tests run correctly (outside the t/ tree and Test
>        harness), but formal Test harness tests fail for reasons I
>        cannot understand. (The module is still lacking a lot of tests,
>        but since things were failing in the harness I have placed them
>        as a lower priority and have been working to my micro-script
>        tests - yes, bad form.
>     2. I am having a big problem with IPC::Run for one of the
>        executables (the module can call 5 different excutables for 7
>        commands), bowtie-maptool (module command 'map'). All the other
>        commands tested (this excludes bowtie-maqconvert [convert
>        command]) work fine, but maptool fails with an illegal seek -
>        presumably due to the redirection handling? I have no idea how
>        to resolve this, so help would be greatly appreciated (a small
>        script that demonstrates the use that results in the failure is
>        below).
> 
> There will be provision for returning a Bio::Assembly::IO object through
> samtools in the finished module, but currently the
> Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.
> 
> Thanks for any help.
> Dan
> 
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use Bio::Tools::Run::Bowtie;
> 
> # These files are in the bioperl-run t/data/ tree
> my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
> my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';
> 
> my $bowtiefac = Bio::Tools::Run::Bowtie->new(
> -command             => 'single',
> -max_seed_mismatches => 2,
> -seed_length         => 28,
> -max_qual_mismatch   => 70,
> -sam_format          => 0
> );
> 
> my $align = $bowtiefac->run($rdq,$refseq); # this runs fine
> 
> my $bowtiemap = Bio::Tools::Run::Bowtie->new(
> -command             => 'map'
> );
> 
> my $map = $bowtiemap->run($align); # throws Illegal seek
> 
> print "$map\n";
> 
> open (IN,$map);
> my $lines =(my @lines)= <IN>;
> print @lines;
> print "\n\n$lines\n";
> close IN;
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From dan.kortschak at adelaide.edu.au  Tue Dec 22 00:51:30 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 16:21:30 +1030
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <F7513FBADF944B51823A5F22FFA85911@NewLife>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
	<F7513FBADF944B51823A5F22FFA85911@NewLife>
Message-ID: <1261461090.4411.13.camel@epistle>

Hi Mark,

maptool either outputs to stdout or a specified file - I chose to use a
specified file and run it that way, but I've tried the redirect a you
suggest, with the same failure result. I think it's a strangeness of
maptool (which may well be a reason for it being dropped - also note the
maptool output doesn't seem reasonable for the test data provided even
when run from the command line).

It's probably a result of difficult interaction between IPC::Run and
maptool. Any funkiness in your code is not likely to be a cause as I've
deeply analysed what is being passed to IPC::Run, and I've quite
extensively modified the IPC run handling method from your code to take
into account the differences between a single executable with many
commands as the base code managed from a cluster of executables each
taking a small subset of different filespecs as bowtie needs. My
funkiness will undoubtedly swamp yours.

Resolution: Will drop bowtie-maptool from module.

(Should test maqconvert - if it fails, this will be dropped also unless
someone asks otherwise).

When the module copes with 0.11.* properly I'll start thinking about
0.12.* which has colourspace handling to deal with.

cheers
Dan

On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote:
> Hey Dan, 
> It looks like if the outfile isn't specified on the commandline for
> maptool, then the align is written to stdout. So, you could 
> try this workaround in in Bowtie/Config.pm:
> 
> our %command_files = (
>     'single'     => [qw( ind seq #out )],
>     'paired'     => [qw( ind seq seq2 #out )],
>     'crossbow'   => [qw( ind seq #out )],
>     'build'      => [qw( ref out )],
>     'inspect'    => [qw( ind >#out )],
>     'convert'    => [qw( bwt out bfa )],
> -    'map'        => [qw( bwt #out )]
> +    'map'        => [qw( bwt >#out )]
>     );
> 
> which should be transparent to the user. If this works, then
> there is probably something funky going on with IPC::Run
> + maptool; if it doesn't, then the funkiness is prob. in my code.
> 
> I notice, however, that both bowtie-maptool and bowtie-maqconvert
> have been removed from the 0.12.0-beta release 
> (http://bowtie-bio.sourceforge.net/index.shtml)...
> 
> cheers MAJ


From lovebaby39 at gmail.com  Wed Dec 23 05:48:55 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Wed, 23 Dec 2009 18:48:55 +0800
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>

Dear all

I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how 
to get "P.pastoris DNA for pPIC9K expression vector".

    while (my $result_u =  $blast_report_u-> next_result ) {
        while (my $hit_u = $result_u->next_hit()){
            while (my $hsp_u = $hit_u->next_hsp()){
                    $hit_u->name;
                    $hsp_u->evalue;
                    $hsp_u->score;
            }
        }
    }

I will appreciate if you could tell me how to do it.

P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download 
link?)


The flow is BLAST result:
-------------------------------------------------------------------------------------------------------------------------------------
BLASTN 2.2.16 [Mar-25-2007]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
Query=
         (458 letters)

Database: UniVec (build 4.0)
           2416 sequences; 597,480 total letters
Searching..................................................done
                                                                             
                                        Score    E
Sequences producing significant alignments: 
(bits)     Value

gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 
26   3.1
gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 
26   3.1
gnl|uv|U13843.1:1887-9923 pBPV cloning vector 
26   3.1

>gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
          Length = 2781

 Score = 26.3 bits (13), Expect = 3.1
 Identities = 13/13 (100%)
 Strand = Plus / Plus

Query: 352  tactaccgccatt 364
            |||||||||||||
Sbjct: 2209 tactaccgccatt 2221
-------------------------------------------------------------------------------------------------------------------------------------

Reginald Hsueh 


From hrh at fmi.ch  Wed Dec 23 10:14:06 2009
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Wed, 23 Dec 2009 16:14:06 +0100
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>
Message-ID: <C757F24E.5FE2%hrh@fmi.ch>

Hi

Assuming you are using "SearchIO", try:

$hit_u->description

for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO


Regards, Hans


On 12/23/09 11:48 AM, "Hsueh" <lovebaby39 at gmail.com> wrote:

> Dear all
> 
> I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how
> to get "P.pastoris DNA for pPIC9K expression vector".
> 
>     while (my $result_u =  $blast_report_u-> next_result ) {
>         while (my $hit_u = $result_u->next_hit()){
>             while (my $hsp_u = $hit_u->next_hsp()){
>                     $hit_u->name;
>                     $hsp_u->evalue;
>                     $hsp_u->score;
>             }
>         }
>     }
> 
> I will appreciate if you could tell me how to do it.
> 
> P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download
> link?)
> 
> 
> 
> The flow is BLAST result:
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> BLASTN 2.2.16 [Mar-25-2007]
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> Query=
>          (458 letters)
> 
> Database: UniVec (build 4.0)
>            2416 sequences; 597,480 total letters
> Searching..................................................done
>                  
>                                         Score    E
> Sequences producing significant alignments:
> (bits)     Value
> 
> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve...
> 26   3.1
> gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo
> 26   3.1
> gnl|uv|U13843.1:1887-9923 pBPV cloning vector
> 26   3.1
> 
>> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
>           Length = 2781
> 
>  Score = 26.3 bits (13), Expect = 3.1
>  Identities = 13/13 (100%)
>  Strand = Plus / Plus
> 
> Query: 352  tactaccgccatt 364
>             |||||||||||||
> Sbjct: 2209 tactaccgccatt 2221
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> 
> Reginald Hsueh 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 13:36:49 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 12:36:49 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
Message-ID: <200912231236490784820@gmail.com>

Hi Everyone,

I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 

I attached my CODEML outputs here to see whether you guys have some idea. 

Many thanks ahead!
 				
Best regards,
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.1
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.1
Type: application/octet-stream
Size: 11635 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.3b
Type: application/octet-stream
Size: 11330 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0010.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.3b
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0011.obj>

From cjfields at illinois.edu  Wed Dec 23 16:19:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 23 Dec 2009 15:19:48 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231236490784820@gmail.com>
References: <200912231236490784820@gmail.com>
Message-ID: <B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>

Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.

Can you file a bioperl bug report for this?  It's the best place to keep track.

http://bugzilla.open-bio.org/

chris

On Dec 23, 2009, at 12:36 PM, pkuonline wrote:

> Hi Everyone,
> 
> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
> 
> I attached my CODEML outputs here to see whether you guys have some idea. 
> 
> Many thanks ahead!
> 				
> Best regards,
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 17:45:54 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 16:45:54 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
Message-ID: <200912231645536094087@gmail.com>

Hi Chris,

Thanks for your reply and I just submitted this bug to bugzilla. 

Have a nice holiday!
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago

>-------------------------------------------------------------
>From: Chris Fields
>Time: 2009-12-23 15:19:50
>To: pkuonline  bioperl-l
>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1

>Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.
>
>Can you file a bioperl bug report for this?  It's the best place to keep track.
>
>http://bugzilla.open-bio.org/
>
>chris
>
>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>
>> Hi Everyone,
>> 
>> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
>> 
>> I attached my CODEML outputs here to see whether you guys have some idea. 
>> 
>> Many thanks ahead!
>> 				
>> Best regards,
>> -------------------------------------------------------------
>> Yong Zhang
>> Ph.D, Research Scholar
>> Manyuan Long's Lab
>> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From David.Messina at sbc.su.se  Wed Dec 23 18:23:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 24 Dec 2009 00:23:44 +0100
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se>

Hi Yong,

Could you attach your codeml output to the bug report, too?

I'll take a look at this as soon as I can.


Dave


From maj at fortinbras.us  Thu Dec 24 00:47:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 24 Dec 2009 00:47:10 -0500
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife>

Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ
----- Original Message ----- 
From: "pkuonline" <pkuonline at gmail.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "bioperl-l" <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 23, 2009 5:45 PM
Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1


> Hi Chris,
>
> Thanks for your reply and I just submitted this bug to bugzilla.
>
> Have a nice holiday!
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago
>
>>-------------------------------------------------------------
>>From: Chris Fields
>>Time: 2009-12-23 15:19:50
>>To: pkuonline  bioperl-l
>>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
>
>>Well, not completely unexpected, but very frustrating nonetheless.  Changes to 
>>PAML output have broken in just about every PAML parser revision.  Not sure 
>>when this will be addressed unfortunately, my hope is sooner than later.
>>
>>Can you file a bioperl bug report for this?  It's the best place to keep 
>>track.
>>
>>http://bugzilla.open-bio.org/
>>
>>chris
>>
>>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>>
>>> Hi Everyone,
>>>
>>> I used the latest Bioperl build, 
>>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to 
>>> parse CODEML result. I searched the mail list and found current PAML parser 
>>> is compatible with PAML 4.3a, 
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. 
>>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser 
>>> does not work. More strangely, I tested it on the old PAML 4.1 result and 
>>> also failed.
>>>
>>> I attached my CODEML outputs here to see whether you guys have some idea.
>>>
>>> Many thanks ahead!
>>>
>>> Best regards,
>>> -------------------------------------------------------------
>>> Yong Zhang
>>> Ph.D, Research Scholar
>>> Manyuan Long's Lab
>>> University of 
>>> Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 


From bhakti.dwivedi at gmail.com  Fri Dec 25 21:46:51 2009
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Fri, 25 Dec 2009 21:46:51 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
Message-ID: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>

Hi,

Does anyone know how to retrieve the "Source" or the "Species name" given
the accession number using Bioperl.   I have these 30,000 accession numbers
for which I need to get the source organisms.  Any kind of help will be
appreciated.

Thanks

BD


From maj at fortinbras.us  Fri Dec 25 22:52:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 25 Dec 2009 22:52:10 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>

Bhakti,
The following example (using EUtilities) may serve your purpose:

use Bio::DB::EUtilities;

my (%taxa, @taxa);
my (%names, %idmap);

# these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
# (probably)

my @ids = qw(1621261 89318838 68536103 20807972 730439);

my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
                                       -db => 'taxonomy',
                                       -dbfrom => 'protein',
                                       -correspondence => 1,
                                       -id => \@ids);

# iterate through the LinkSet objects
while (my $ds = $factory->next_LinkSet) {
    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
}

@taxa = @taxa{@ids};

$factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
        -db    => 'taxonomy',
        -id    => \@taxa );

while (local $_ = $factory->next_DocSum) {
    $names{($_->get_contents_by_name('TaxId'))[0]} = 
($_->get_contents_by_name('ScientificName'))[0];
}

foreach (@ids) {
    $idmap{$_} = $names{$taxa{$_}};
}

# %idmap is
#    1621261 => 'Mycobacterium tuberculosis H37Rv'
#    20807972 => 'Thermoanaerobacter tengcongensis MB4'
#    68536103 => 'Corynebacterium jeikeium K411'
#    730439 => 'Bacillus caldolyticus'
#    89318838 => undef    (this record has been removed from the db)

1;

You probably will need to break up your 30000 into chunks
(say, 1000-3000 each), and do the above on each chunk with a

sleep 3;

or so separating the queries.
MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 25, 2009 9:46 PM
Subject: [Bioperl-l] how to retrieve organism name from accession number?


> Hi,
>
> Does anyone know how to retrieve the "Source" or the "Species name" given
> the accession number using Bioperl.   I have these 30,000 accession numbers
> for which I need to get the source organisms.  Any kind of help will be
> appreciated.
>
> Thanks
>
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Sat Dec 26 06:47:29 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 26 Dec 2009 05:47:29 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <AD7C8B9A-61D1-443C-952E-BC7C66E398B2@illinois.edu>


On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote:

> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> ...
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ

The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec.

chris


From arpm9 at charter.net  Sun Dec 27 16:42:09 2009
From: arpm9 at charter.net (arpm9)
Date: Sun, 27 Dec 2009 16:42:09 -0500
Subject: [Bioperl-l]  Should Bio::Tools::BPlite be deprecated?
In-Reply-To: 4533A8D3.90709@sendu.me.uk
Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9>

hi chris,
 I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm


From pengyu.ut at gmail.com  Tue Dec 29 11:08:09 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 10:08:09 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>

May I ask somebody who are versitile in both bioperl and biopython
comment on the pros and cons of bioperl and biopython? I'm sending
this email to both bioperl and biopython mailing lists. But I hope
that it will not result in any contention.

I assume that the functionality between bioperl or biopython is the
same, i.e., tasks can be done in bioperl can be done biopython and
vice versa, as both libraries have been out there over 10 years.
Please correct me if my understanding is not true.

Given that a task that can be done with either bioperl or biopython,
I, in particularly, want to know how long it will take to write the
code for the task in bioperl and biopython, with the same readability
requirement (see below) and the assumption that users have the same
fluency in perl and python.

python is claimed to be good for maintainability. But perl is
criticized for there-are-many-ways-for-a-given-task. Since there are
multiple ways in perl, let us assume that we always use perl in a
readable way.


From jason at bioperl.org  Tue Dec 29 11:49:20 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 08:49:20 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>

Are you asking for the purposes of choosing a toolkit for your work or  
just curious about the advantages/disadvantages of language choice?

-jason
On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:

> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
>
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
>
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From ak at ebi.ac.uk  Tue Dec 29 11:57:18 2009
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Tue, 29 Dec 2009 16:57:18 +0000
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk>

On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
> 
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
> 
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
> 
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

Assuming, as you do, that the functionality of BioPerl and BioPython is
the same:  Which of the two programming languages are you (or your team)
most proficient in?  Use that language.

Regards,
Andreas

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom


From sdavis2 at mail.nih.gov  Tue Dec 29 12:03:40 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 12:03:40 -0500
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.

The two projects have similar goals, but saying that the functionality
is the same would be an extreme oversimplification.  You will need to
define what you want to do and then check to see what the two projects
have to offer.  This will, in general, require perusing the websites
for both projects as well as the relevant documentation.

> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.

Again, you will want to define the task(s) to be accomplished and then
weigh the pros and cons of each project combined with local expertise.
 If you don't know what you want to do, then you can certainly read
some examples on the websites and see which project strikes you as a
"winner" for you.

> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

These two statements are generalizations that provide little insight
into the strengths or weaknesses of the languages.  In other words,
one can write good or bad code in both languages.

Hope that helps.

Sean


From wenzhiwang1983 at yahoo.com.cn  Tue Dec 29 13:30:02 2009
From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi)
Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST)
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com>

Dear Jason,

Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO?

Thanks.

Wenzhi Wang
   State Key Laboratory of Genetic Resources and Evolution
   Kunming Institute of Zoology, Chinese Academy of Sciences
   Kunming, Yunnan 650223 P. R. China
   Tel:  86 871 5198 993
   Fax: 86 871 5195 430
   E-mail: wenzhiwang1983 at yahoo.com.cn


      ___________________________________________________________ 
  ?????????????????????????????????? 
http://card.mail.cn.yahoo.com/


From pengyu.ut at gmail.com  Tue Dec 29 13:58:59 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 12:58:59 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com>

To choose a toolkit for my work.

On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich <jason at bioperl.org> wrote:
> Are you asking for the purposes of choosing a toolkit for your work or just
> curious about the advantages/disadvantages of language choice?
>
> -jason
> On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:
>
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>>
>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From pengyu.ut at gmail.com  Tue Dec 29 14:15:14 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 13:15:14 -0600
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>
> The two projects have similar goals, but saying that the functionality
> is the same would be an extreme oversimplification. ?You will need to
> define what you want to do and then check to see what the two projects
> have to offer. ?This will, in general, require perusing the websites
> for both projects as well as the relevant documentation.

According to your experience, are there some tasks that are easier
with one than with another?

>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>
> Again, you will want to define the task(s) to be accomplished and then
> weigh the pros and cons of each project combined with local expertise.
> ?If you don't know what you want to do, then you can certainly read
> some examples on the websites and see which project strikes you as a
> "winner" for you.
>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>
> These two statements are generalizations that provide little insight
> into the strengths or weaknesses of the languages. ?In other words,
> one can write good or bad code in both languages.
>
> Hope that helps.
>
> Sean
>


From alperyilmaz at gmail.com  Tue Dec 29 14:36:03 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Tue, 29 Dec 2009 14:36:03 -0500
Subject: [Bioperl-l] Bio::TreeIO,
	Bio::Tree::Draw::Cladogram and phyloxml issues..
Message-ID: <dac81b0d0912291136x53edf2cjc6728e7062bd3bc1@mail.gmail.com>

Hello,

I have a tree in phyloxml format, and am trying to draw a subtree by
using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for
drawing and encountered some problems.

When I use whole tree and draw it, everything is fine; but, when I
pick a particular node and construct the subtree from that node's
ancestor by using "my $subtree = Bio::Tree::Tree->new(-root =>
$new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a
faulty EPS file, which contains extra lines added in the middle of the
file.
For instance:
.
.
.
72.0820393261372 126 moveto
(OsIBCD006509) show
30 81.25 moveto
 81.25 lineto
  lineto
48.5410196630686 120 moveto
30 120 lineto
.
.
.

Should read:

72.0820393261372 126 moveto
(OsIBCD006509) show
48.5410196630686 120 moveto
30 120 lineto


Also, I tried to write the subtree into a new phyloxml file first,
then draw it. The code is shown as follows:
my $savefile = "save.phyloxml";
my $treeout = Bio::TreeIO->new(-format =>'phyloxml',
                               -file => ">$savefile");
$treeout->write_tree($subtree);
my $tree2 = Bio::TreeIO->new(-format =>'phyloxml',
                                                 -file => "save.phyloxml");
my $t1 = $tree2->next_tree;
my $image_output = "test.eps";
my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree   => $t1,
                                                                  -top    => 10,
                                                                -bottom => 10,);
$obj1->print(-file => $image_output);

The generated phyloxml file, which is named save.phyloxml, has an
additional new line between "</phylogeny>" and "</phyloxml>" at the
end of the file. And this additional new line lead an error when doing
the parsing(open file and draw eps). I removed the new line, manually,
then Bio::Tree::Draw::Cladogram gave me the eps file successfully.

Anyone knows how to fix these problems:
1- faulty eps file generation
2- additional newline character in phyloxml output

Is it the problem about the way I create the subtree?

The phyloxml file I used can be downloaded from:
http://grassius.org/download/HSF.phyloxml

Run this code with the phyloxml file to see newline character problem:
http://pastebin.com/f87ee1ee

Run this code with the phyloxml file to see faulty eps file problem:
http://pastebin.com/fc4715a1

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954


From pengyu.ut at gmail.com  Tue Dec 29 16:32:17 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 15:32:17 -0600
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>

http://bioperl.org/Core/Latest/modules.html

Many links if not all are broken on the above pages. Could somebody fix it?

For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
I see the following error.

There is currently no text in this page. You can search for this page
title in other pages, search the related logs, or edit this page.


From jason at bioperl.org  Tue Dec 29 16:49:00 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:49:00 -0800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>

That is an outdated URL I am not sure where you are linking it from.  
We can probably now disable all old '/Core' URLs.

All documentation links are in the /wiki/

The beginner's howto is here for example
  http://bioperl.org/wiki/HOWTO:Beginners

> http://www.bioperl.org/wiki/HOWTOs


On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:

> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody  
> fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 16:50:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:50:26 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
References: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
Message-ID: <AA645194-F78E-4484-8952-02C40C1270F4@bioperl.org>

yep - be great if someone were to write it.  This being a volunteer  
project we welcome your contribution.  No I don't specifically have  
plans to do it, but maybe you can give it a try or another population  
genetics interested bioperl user/developer?

-jason
On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote:

> Dear Jason,
>
> Plink is a very useful program in the population genetics,  
> especially in the Genome-Wide SNP scan era. Is there any plan to add  
> the Plink (ped or tped) format to Bio::PopGen::IO?
>
> Thanks.
>
> Wenzhi Wang
>   State Key Laboratory of Genetic Resources and Evolution
>   Kunming Institute of Zoology, Chinese Academy of Sciences
>   Kunming, Yunnan 650223 P. R. China
>   Tel:  86 871 5198 993
>   Fax: 86 871 5195 430
>   E-mail: wenzhiwang1983 at yahoo.com.cn
>
>
>      ___________________________________________________________
>  ?????????????????
> http://card.mail.cn.yahoo.com/

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 16:57:49 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:57:49 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org>


On Dec 29, 2009, at 11:15 AM, Peng Yu wrote:

> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>  
> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com>  
>> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the  
>> functionality
>> is the same would be an extreme oversimplification.  You will need to
>> define what you want to do and then check to see what the two  
>> projects
>> have to offer.  This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?

As you have still failed to give much insight into the 'tasks' it is  
hard to give you a better answer.

If there is a module or set of routines already written then yes one  
might be easier than the other. Otherwise it just depends on your  
strengths in the programming language.
We discussed the strengths of the different toolkits briefly on the  
podcast last month.  http://twit.tv/floss96

I echo Sean. Use whichever language you are a better programmer in.   
BioPerl is more mature in some facets than is BioPython, but BioPython  
has some components that are more heavily developed and supported than  
BioPerl (structures being one of those and interfacing that to pyMol  
would be a strength).   I personally think the Gbrowse, Bio-Graphics,  
and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence  
databases and Features is a critical aspect of mining  genomic data  
and features and use these heavily in my work, making BioPerl easy and  
powerful for my tasks. That and sequence and alignment parsing and  
reformatting.  But there are comparable tools written in python with  
and without BioPython that you can also use so mainly it is about  
building up an expertise in a toolkit and going forward.  The BioPerl  
faithful will probably say it is more useful toolkit to us, but we are  
of course a biased sample.

Both projects can benefit from more users and developers contributing  
code and documentation so I would just jump in and give it a try if  
you are unsure which will be easier for you.

>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same  
>>> readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and  
>> then
>> weigh the pros and cons of each project combined with local  
>> expertise.
>>  If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages.  In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From pengyu.ut at gmail.com  Tue Dec 29 17:01:05 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:01:05 +1800
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and CDS boundary for 	a RefSeq ID?
Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>

I see the following example. But it is not clear to me how to get the
exon sequences. I also want to get the exon boundaries and associated
CDS boundaries. Although, I can get the boundary information from ucsc
table browser, but it would be convenient if I can get it in bioperl
along with the sequence.

Could somebody let me know how do it?

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html


From sdavis2 at mail.nih.gov  Tue Dec 29 17:13:30 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 17:13:30 -0500
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com>

On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.

It is unfortunate that the links are broken on that page.  However, I
believe that page is somewhat outdated, anyway.  Here are the HOWTO
pages:

http://www.bioperl.org/wiki/HOWTOs

Sean


From pengyu.ut at gmail.com  Tue Dec 29 17:21:16 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:21:16 +1800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
	<A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com>

On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich <jason at bioperl.org> wrote:
> That is an outdated URL I am not sure where you are linking it from. We can
> probably now disable all old '/Core' URLs.

I'm linked from here.

http://www.bioperl.org/wiki/BioPerl_Tutorial

Since those URLs are outdated. Could you please fix the links on the above link?

> All documentation links are in the /wiki/
>
> The beginner's howto is here for example
> ?http://bioperl.org/wiki/HOWTO:Beginners
>
>> http://www.bioperl.org/wiki/HOWTOs
>
>
> On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:
>
>> http://bioperl.org/Core/Latest/modules.html
>>
>> Many links if not all are broken on the above pages. Could somebody fix
>> it?
>>
>> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
>> I see the following error.
>>
>> There is currently no text in this page. You can search for this page
>> title in other pages, search the related logs, or edit this page.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From sdavis2 at mail.nih.gov  Tue Dec 29 18:06:17 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 18:06:17 -0500
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and 	CDS boundary for a RefSeq ID?
In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com>

On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> I see the following example. But it is not clear to me how to get the
> exon sequences. I also want to get the exon boundaries and associated
> CDS boundaries. Although, I can get the boundary information from ucsc
> table browser, but it would be convenient if I can get it in bioperl
> along with the sequence.
>
> Could somebody let me know how do it?
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html

Hi, Peng.  There may be some confusion, as the UCSC database aligns
RefSeq sequence to a genome to generate exon start and end
coordinates.  However, the RefSeq records retrieved by Bio::DB::RefSeq
are not in genomic context and so do not have start and end locations
on the genome.  That is, if you want the starts and ends along the
genome, that information is not available from the RefSeq record
itself, I don't think.  If that is what you need (genomic
coordinates), you can download the information directly from UCSC,
download flat files from NCBI mapview, or even from ensembl (using
biomart, for instance).  If you are looking for a bioperl-compliant
way of doing this, look at the Ensembl Perl API.

Sean


From jkhilmer at gmail.com  Tue Dec 29 14:55:18 2009
From: jkhilmer at gmail.com (Jonathan Hilmer)
Date: Tue, 29 Dec 2009 12:55:18 -0700
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>

Personally, I think that the differences between Python and Perl
(although substantial) are not large enough to make the language
itself the deciding factor.

Instead, consider the larger community of software.  I haven't yet
found a situation in which Python cannot be applied: it can be used
with R (statistics); lower-level code C or fortran; visualization
software such as PyMol, Chimera, Blender, VTK; plotting with
matplotlib; and scipy/numpy or sage, which provide innumerable
benefits for computation, data-processing, etc.

Although I don't claim to have a great deal of experience with Perl, I
haven't seen the same integration with that language: I'm assuming it
can be used with R and VTK (not sure about C or fortran?).  For this
reason, unless your work is highly targeted and you have no use
programming language integration with other software, I would
recommend Python.

For perl experts, I would truly appreciate any corrections you could
offer to these observations of mine, since I wouldn't mind using perl
if it offers benefits either in general or for specific applications.


Jonathan

On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the functionality
>> is the same would be an extreme oversimplification. ?You will need to
>> define what you want to do and then check to see what the two projects
>> have to offer. ?This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?
>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and then
>> weigh the pros and cons of each project combined with local expertise.
>> ?If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages. ?In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Biopython mailing list ?- ?Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From wgheath at gmail.com  Tue Dec 29 15:16:39 2009
From: wgheath at gmail.com (William Heath)
Date: Tue, 29 Dec 2009 12:16:39 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
	<81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
Message-ID: <f08ddf990912291216h32988b8cv20830c1b6701caf6@mail.gmail.com>

The biggest reason to go with python is the ease of use.  Biologists are not
programmers and the learning curve for python is much smaller than that of
perl.  I like perl but choose python because of this issue.  Perl 6 does
address some of these issues however but this has not been fully implemented
as of yet.

-Tim

P.S.

I love, love, love cpan though which is only for perl right now :(

On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer <jkhilmer at gmail.com>wrote:

> Personally, I think that the differences between Python and Perl
> (although substantial) are not large enough to make the language
> itself the deciding factor.
>
> Instead, consider the larger community of software.  I haven't yet
> found a situation in which Python cannot be applied: it can be used
> with R (statistics); lower-level code C or fortran; visualization
> software such as PyMol, Chimera, Blender, VTK; plotting with
> matplotlib; and scipy/numpy or sage, which provide innumerable
> benefits for computation, data-processing, etc.
>
> Although I don't claim to have a great deal of experience with Perl, I
> haven't seen the same integration with that language: I'm assuming it
> can be used with R and VTK (not sure about C or fortran?).  For this
> reason, unless your work is highly targeted and you have no use
> programming language integration with other software, I would
> recommend Python.
>
> For perl experts, I would truly appreciate any corrections you could
> offer to these observations of mine, since I wouldn't mind using perl
> if it offers benefits either in general or for specific applications.
>
>
> Jonathan
>
> On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> >>> May I ask somebody who are versitile in both bioperl and biopython
> >>> comment on the pros and cons of bioperl and biopython? I'm sending
> >>> this email to both bioperl and biopython mailing lists. But I hope
> >>> that it will not result in any contention.
> >>>
> >>> I assume that the functionality between bioperl or biopython is the
> >>> same, i.e., tasks can be done in bioperl can be done biopython and
> >>> vice versa, as both libraries have been out there over 10 years.
> >>> Please correct me if my understanding is not true.
> >>
> >> The two projects have similar goals, but saying that the functionality
> >> is the same would be an extreme oversimplification.  You will need to
> >> define what you want to do and then check to see what the two projects
> >> have to offer.  This will, in general, require perusing the websites
> >> for both projects as well as the relevant documentation.
> >
> > According to your experience, are there some tasks that are easier
> > with one than with another?
> >
> >>> Given that a task that can be done with either bioperl or biopython,
> >>> I, in particularly, want to know how long it will take to write the
> >>> code for the task in bioperl and biopython, with the same readability
> >>> requirement (see below) and the assumption that users have the same
> >>> fluency in perl and python.
> >>
> >> Again, you will want to define the task(s) to be accomplished and then
> >> weigh the pros and cons of each project combined with local expertise.
> >>  If you don't know what you want to do, then you can certainly read
> >> some examples on the websites and see which project strikes you as a
> >> "winner" for you.
> >>
> >>> python is claimed to be good for maintainability. But perl is
> >>> criticized for there-are-many-ways-for-a-given-task. Since there are
> >>> multiple ways in perl, let us assume that we always use perl in a
> >>> readable way.
> >>
> >> These two statements are generalizations that provide little insight
> >> into the strengths or weaknesses of the languages.  In other words,
> >> one can write good or bad code in both languages.
> >>
> >> Hope that helps.
> >>
> >> Sean
> >>
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From pengyu.ut at gmail.com  Wed Dec 30 12:26:45 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Thu, 31 Dec 2009 11:26:45 +1800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>

With Bio::SeqIO, I can only read in the records in a fasta file one by
one. This is preferable if there are many records in a file.

But I also want to read all the records in. I could use a while loop
to read all records in. But could somebody let me know if there is a
function in bioperl that can read in all the record at once and return
me an object?

http://www.bioperl.org/wiki/HOWTO:SeqIO


From sdavis2 at mail.nih.gov  Wed Dec 30 13:04:53 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 30 Dec 2009 13:04:53 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>

On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
>
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?

In perl, you can use an array to store the records.  You could also
use a hash if you have reasonable keys for the entries.

Sean


> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Wed Dec 30 14:58:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 Dec 2009 11:58:54 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org>

or use a database object so you can retrieve sequences that have a  
particular id. See Bio::DB::Fasta
On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:

> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>> by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and  
>> return
>> me an object?
>
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
>
> Sean
>
>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Wed Dec 30 16:20:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 30 Dec 2009 16:20:31 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife>

I think you might want Bio::AlignIO:

$alnio = Bio::AlignIO->new(-file=> 'my.fas' );
$aln = $alnio->next_aln;
@seqs = $aln->each_seqs;

MAJ
----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 30, 2009 12:26 PM
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?


> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From David.Messina at sbc.su.se  Thu Dec 31 05:55:32 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 31 Dec 2009 11:55:32 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
Message-ID: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave


From David.Messina at sbc.su.se  Tue Dec  1 05:14:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 1 Dec 2009 11:14:40 +0100
Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem
	to be parsed
In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk>
	<50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se>
	<8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
Message-ID: <ECCDC4FE-DF46-4CF8-806F-750837DED8AA@sbc.su.se>

Hi Mick,

Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file?

In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it?

Thanks,
Dave


On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote:

> Hi Dave
> 
> Just got round to looking at this.
> 
> In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something:
> 
> --------------------- WARNING ---------------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> ---------------------------------------------------
> 
> However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace:
> 
> ------------- EXCEPTION -------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347
> STACK toplevel parse2.pl:20
> -------------------------------------
> 
> I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module.
> 
> Is this another bug report?
> 
> Thanks again for all your help
> 
> Mick
> 
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se] 
> Sent: 23 November 2009 17:46
> To: michael watson (IAH-C)
> Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed
> 
> Hi Mick,
> 
> Sure thing -- the current build from subversion is packaged up every  
> night and available here:
> http://www.bioperl.org/DIST/nightly_builds/
> 
> Just grab bioperl-live.tar.gz from there and you'll get the changes.
> 
> 
> Dave
> 
> 
> 
> 
> On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote:
> 
>> Hi Dave
>> 
>> Thanks for the hard work.
>> 
>> Trying to get the latest updates so I can use this... don't have svn  
>> on my server, tried to install it and I don't have python either,  
>> which is needed to install it.
>> 
>> I face about 3 weeks whilst my IT department sort this out, unless I  
>> can access the changes any other way?
>> 
>> Thanks
>> Mick
>> 
>> -----Original Message-----
>> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- 
>> daemon at portal.open-bio.org]
>> Sent: 20 November 2009 15:12
>> To: michael watson (IAH-C)
>> Subject: [Bug 2937] Strand in fasta35 output does not seem to be  
>> parsed
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2937
>> 
>> 
>> online at davemessina.com changed:
>> 
>>          What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>            Status|NEW                         |RESOLVED
>>        Resolution|                            |FIXED
>> 
>> 
>> 
>> 
>> ------- Comment #7 from online at davemessina.com  2009-11-20 10:12 EST  
>> -------
>> Fixed in r16394.
>> 
>> Michael, thanks for the report. Your test cases pass, but please  
>> reopen the bug
>> if needed.
>> 
>> 
>> -- 
>> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? 
>> tab=email
>> ------- You are receiving this mail because: -------
>> You reported the bug, or are watching the reporter.
> 


From e.osimo at gmail.com  Tue Dec  1 13:05:48 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Tue, 1 Dec 2009 19:05:48 +0100
Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test
Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com>

Hello everyone,
I'm trying to get the p value of a statistic made with Statistics::TTest
I cannot find this function: I can find if the null hypothesis is rejected
at a certain confidence level, but I cannot make the script show me the
actual p value.
Do you know other scripts that can do that?

Thanks
Emanuele


From cjfields at illinois.edu  Tue Dec  1 14:25:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 1 Dec 2009 13:25:03 -0600
Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov>
Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu>

I'll be adjusting the requisite parameters as indicated below.  I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it.

chris

Begin forwarded message:

> From: <utilities-announce at ncbi.nlm.nih.gov>
> Date: December 1, 2009 12:59:34 PM CST
> To: <utilities-announce at ncbi.nlm.nih.gov>
> Subject: [Utilities-announce] NCBI E-Utility Policy Change
> Reply-To: utilities-announce at ncbi.nlm.nih.gov
> 
> As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests.
>  
> The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request.
>  
> The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request.
>  
> NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities.
>  
> NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov.
>  
> Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service.
>  
> _______________________________________________
> Utilities-announce mailing list
> http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce


From maj at fortinbras.us  Tue Dec  1 21:27:06 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 21:27:06 -0500
Subject: [Bioperl-l] test test test
Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife>

MAJ


From ocarnorsk138 at gmail.com  Tue Dec  1 21:59:48 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Tue, 1 Dec 2009 23:59:48 -0300
Subject: [Bioperl-l] test test test
In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
Message-ID: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>

test test test test back


O'car Campos C.
Bioinformatics Engineering Student.
University of Talca.
Chile.


2009/12/1 Mark A. Jensen <maj at fortinbras.us>

> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Tue Dec  1 22:08:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 22:08:23 -0500
Subject: [Bioperl-l] test test test
In-Reply-To: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
	<b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
Message-ID: <CC7F9A12F9474D2BB5DC4E69190F2AE6@NewLife>

I love when people are paying attention!
  ----- Original Message ----- 
  From: Ocar Campos 
  To: Mark A. Jensen ; Bioperl Mailing List. 
  Sent: Tuesday, December 01, 2009 9:59 PM
  Subject: Re: [Bioperl-l] test test test


  test test test test back


  O'car Campos C.
  Bioinformatics Engineering Student.
  University of Talca.
  Chile.


  2009/12/1 Mark A. Jensen <maj at fortinbras.us>

    MAJ
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rtbio.2009 at gmail.com  Wed Dec  2 07:07:08 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Wed, 2 Dec 2009 13:07:08 +0100
Subject: [Bioperl-l] Remote blast
Message-ID: <c7cac1600912020407j176c83edm9f5a3d151f507bd2@mail.gmail.com>

Hello everyone,

I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a
cgi script was written which connects to NCBI blast using remote blast
program,i.e.,

The input sequence given in the html page is taken as input and Remote blast
is performed on this based on the code for Remote blast.But,I have a problem
in the Remote blast code.

My code goes like this

@compseqs=blastcode($in{'Inputseq'});

sub blastcode
{
$input1= $_[0];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
brucei[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


 while (my $input = $str->next_seq())

{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

  print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
          my $filename = $result->query_name()."\.out";
           $factory->save_output($filename);
          $factory->remove_rid($rid);
         #       open(BLASTDEBUGFILE,'>',$blastdebugfile);
  #     print BLASTDEBUGFILE "Test1  $result";
   #     close(BLASTDEBUGFILE);

     open(OUTFILE,'>',$outfile);
     print OUTFILE "Test2 $result->database_name()";
     close(OUTFILE);

    while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);

              # open(OUTFILE,'>',$outfile);
              # print OUTFILE "in while hits";
              #close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
}
# open(OUTFILE,'>',$outfile);
  #print OUTFILE $seqs[0];
 # close(OUTFILE);

return(@seqs);
}

Here in the above code,my program is able to go till the 'else' part and
writing the output file i.e.,this step.
my $filename = $result->query_name()."\.out";

But when I tried to enter in to the next while loop where I can get the
hits,the program is not entering into the while loop i.e.,

Not entering into this
while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);


Hence I am unable to get any hits for my query.
Ex:-If the query's accession number is Tb11.02.2210, I could just get a file
Tb11.02.2210.out file,it is just displaying the file name on the browser.

Please help me in solving this problem and mail me regarding any confusions.

Regards,
Roopa.


From ashvip at gmail.com  Wed Dec  2 00:24:09 2009
From: ashvip at gmail.com (Vipin Singh)
Date: Wed, 2 Dec 2009 10:54:09 +0530
Subject: [Bioperl-l] Problems with installation
Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>

Dear Sir/Madam,
I have not been able to install bioperl on my Windows 32 machine despite
repeated attempts. I have tried both Active Perl and Strwaberry perl but
both do not seem to work.
I have followed the instruction given at
-- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Please guide.
Thanks,
Vipin.
Vipin Singh,
Senior Research Fellow,
Centre for Cellular and Molecular Biology,
Hyderabad - 500007
India.
contact - 91-040-27192778


From scott at scottcain.net  Wed Dec  2 09:18:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 2 Dec 2009 09:18:37 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com>

Hello Vipin,

"do not seem to work" doesn't give us much to go on; can you tell us
what happened?

Scott


On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh <ashvip at gmail.com> wrote:
> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From maj at fortinbras.us  Wed Dec  2 09:18:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 09:18:31 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife>

Hi Vipin--
We need some more information; your commands, error messages you received.
Thanks, 
Mark
----- Original Message ----- 
From: "Vipin Singh" <ashvip at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 12:24 AM
Subject: [Bioperl-l] Problems with installation


> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From bcantarel at som.umaryland.edu  Wed Dec  2 13:36:27 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 13:36:27 -0500
Subject: [Bioperl-l] Parsing Genbank
Message-ID: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>

Hi all,
I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.

x $cds->start
1
x $cds->end
64

How can I get the original coordinates?  Is there a command for that or will I have to just do the math?

Feature or Bug?


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore


From maj at fortinbras.us  Wed Dec  2 14:09:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:09:11 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
Message-ID: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>

Hi Brandi-
If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
ordinary Bio::Seq, that's normal.
Can you elaborate by posting your code?
cheers,
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 1:36 PM
Subject: [Bioperl-l] Parsing Genbank


> Hi all,
> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
> it changes the coordinates of things on the minus strand.
>
>
> For example, I have a sequence that has a CDS on the minus strand at it is 
> from 911 to 974.  The sequence is 974 nt.
>
> x $cds->start
> 1
> x $cds->end
> 64
>
> How can I get the original coordinates?  Is there a command for that or will I 
> have to just do the math?
>
> Feature or Bug?
>
>
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bcantarel at som.umaryland.edu  Wed Dec  2 14:29:56 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 14:29:56 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>

Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
		       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
	next F1 unless ($cds->primary_tag() eq 'CDS');
	#do something with the cds start and cds end
	}
}
	 

LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
> 
> 
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>> 
>> 
>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>> 
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>> 
>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>> 
>> Feature or Bug?
>> 
>> 
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From maj at fortinbras.us  Wed Dec  2 14:48:44 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:48:44 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife>

with fake seq data and that header, I don't get a problem:

  DB<2> x $cds->location
0  Bio::Location::Simple=HASH(0x37b1df4)
   '_end' => 974
   '_location_type' => 'EXACT'
   '_root_verbose' => 0
   '_seqid' => 'subjpool12_contig3'
   '_start' => 911
   '_strand' => '-1'

Are you using the latest BioPerl (1.6.1 or the trunk) ?
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 2:29 PM
Subject: Re: [Bioperl-l] Parsing Genbank


Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
next F1 unless ($cds->primary_tag() eq 'CDS');
###>> debugger stops here for above output

#do something with the cds start and cds end
}
}


LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 
19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 
>1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
> ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" 
> <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
>
>
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
>> it changes the coordinates of things on the minus strand.
>>
>>
>> For example, I have a sequence that has a CDS on the minus strand at it is 
>> from 911 to 974.  The sequence is 974 nt.
>>
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>>
>> How can I get the original coordinates?  Is there a command for that or will 
>> I have to just do the math?
>>
>> Feature or Bug?
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 14:39:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 13:39:40 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu>

That one's odd; the coordinates should relate back to the original sequence.  Any chance you could pass on the sequence file so we can confirm it?  you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem).

chris

On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote:

> Here is some of my code, the real code actually enters the data into a database.
> 
> 
> $in  = Bio::SeqIO->new(-file => $gbkfile,
> 		       '-format' => 'genbank');
> 
> W1:while (my $seq = $in->next_seq()) {
>  my @feats = $seq->get_all_SeqFeatures();
>  my $j = 0;
> F1:foreach $cds (@feats) {
> 	next F1 unless ($cds->primary_tag() eq 'CDS');
> 	#do something with the cds start and cds end
> 	}
> }
> 	 
> 
> LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
> ACCESSION   subjpool12_contig3
> KEYWORDS    .
> SOURCE      human metagenome
>  ORGANISM  human metagenome
>            unclassified sequences; organismal metagenomes,metagenomes.
> FEATURES             Location/Qualifiers
>     source          1..974
>                     /mol_type="genomic DNA"
>                     /isolation_source="Homo sapiens"
>                     /organism="human metagenome"
>                     /collection_date="19-Nov-2009"
>     CDS             complement(911..974)
>                     /locus_tag="subjpool12_contig3|metagene|gene_2"
>                     /translation="IRIMTVELINPYIRHVEHST"
>                     /score="2.52804"
>                     /product="hypothetical protein"
>                     /note="score=2.52804"
>                     /note="score=2.52804"
>                     /note="frame=1"
> ORIGIN
> #some sequence?.
> 
> 
> 
> 
>> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
> 
> On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
>> Hi Brandi-
>> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
>> Can you elaborate by posting your code?
>> cheers,
>> MAJ
>> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, December 02, 2009 1:36 PM
>> Subject: [Bioperl-l] Parsing Genbank
>> 
>> 
>>> Hi all,
>>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>>> 
>>> 
>>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>>> 
>>> x $cds->start
>>> 1
>>> x $cds->end
>>> 64
>>> 
>>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>>> 
>>> Feature or Bug?
>>> 
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~
>>> Brandi Cantarel, PhD
>>> Bioinformatics Analyst
>>> Institute for Genome Sciences
>>> School of Medicine
>>> University of Maryland, Baltimore
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Dec  2 15:52:28 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 15:52:28 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife>

Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
as if there is a bug. If you can provide data that can reproduce
it, as Chris suggests, we can get onto it. 
thanks MAJ
  ----- Original Message ----- 
  From: Brandi Cantarel 
  To: Mark A. Jensen 
  Sent: Wednesday, December 02, 2009 3:38 PM
  Subject: Re: [Bioperl-l] Parsing Genbank


  How can I tell what version I am using?When I use the command from the website:


  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'


  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.


  Brandi


  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:


    with fake seq data and that header, I don't get a problem:

    DB<2> x $cds->location
    0  Bio::Location::Simple=HASH(0x37b1df4)
     '_end' => 974
     '_location_type' => 'EXACT'
     '_root_verbose' => 0
     '_seqid' => 'subjpool12_contig3'
     '_start' => 911
     '_strand' => '-1'

    Are you using the latest BioPerl (1.6.1 or the trunk) ?
    MAJ
    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
    Cc: <bioperl-l at lists.open-bio.org>
    Sent: Wednesday, December 02, 2009 2:29 PM
    Subject: Re: [Bioperl-l] Parsing Genbank


    Here is some of my code, the real code actually enters the data into a database.


    $in  = Bio::SeqIO->new(-file => $gbkfile,
         '-format' => 'genbank');

    W1:while (my $seq = $in->next_seq()) {
    my @feats = $seq->get_all_SeqFeatures();
    my $j = 0;
    F1:foreach $cds (@feats) {
    next F1 unless ($cds->primary_tag() eq 'CDS');
    ###>> debugger stops here for above output

    #do something with the cds start and cds end
    }
    }


    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
    ACCESSION   subjpool12_contig3
    KEYWORDS    .
    SOURCE      human metagenome
    ORGANISM  human metagenome
              unclassified sequences; organismal metagenomes,metagenomes.
    FEATURES             Location/Qualifiers
       source          1..974
                       /mol_type="genomic DNA"
                       /isolation_source="Homo sapiens"
                       /organism="human metagenome"
                       /collection_date="19-Nov-2009"
       CDS             complement(911..974)
                       /locus_tag="subjpool12_contig3|metagene|gene_2"
                       /translation="IRIMTVELINPYIRHVEHST"
                       /score="2.52804"
                       /product="hypothetical protein"
                       /note="score=2.52804"
                       /note="score=2.52804"
                       /note="frame=1"
    ORIGIN
    #some sequence?.


      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


    ~~~~~~~~~~~~~~~~~~~~
    Brandi Cantarel, PhD
    Bioinformatics Analyst
    Institute for Genome Sciences
    School of Medicine
    University of Maryland, Baltimore

    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:


      Hi Brandi-

      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.

      Can you elaborate by posting your code?

      cheers,

      MAJ

      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>

      To: <bioperl-l at lists.open-bio.org>

      Sent: Wednesday, December 02, 2009 1:36 PM

      Subject: [Bioperl-l] Parsing Genbank


        Hi all,

        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.


        x $cds->start

        1

        x $cds->end

        64


        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?


        Feature or Bug?


        ~~~~~~~~~~~~~~~~~~~~

        Brandi Cantarel, PhD

        Bioinformatics Analyst

        Institute for Genome Sciences

        School of Medicine

        University of Maryland, Baltimore


        _______________________________________________

        Bioperl-l mailing list

        Bioperl-l at lists.open-bio.org

        http://lists.open-bio.org/mailman/listinfo/bioperl-l


    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 16:07:58 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 15:07:58 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
	<07332179362A4D53ACAA9A72AD208049@NewLife>
Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu>

One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). 

Not much we can do unless we have something to help confirm the problem.  Also might help to know the source of the genbank file itself.

chris

On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote:

> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
> as if there is a bug. If you can provide data that can reproduce
> it, as Chris suggests, we can get onto it. 
> thanks MAJ
>  ----- Original Message ----- 
>  From: Brandi Cantarel 
>  To: Mark A. Jensen 
>  Sent: Wednesday, December 02, 2009 3:38 PM
>  Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>  How can I tell what version I am using?When I use the command from the website:
> 
> 
>  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 
> 
>  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.
> 
> 
>  Brandi
> 
> 
> 
> 
>  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:
> 
> 
>    with fake seq data and that header, I don't get a problem:
> 
>    DB<2> x $cds->location
>    0  Bio::Location::Simple=HASH(0x37b1df4)
>     '_end' => 974
>     '_location_type' => 'EXACT'
>     '_root_verbose' => 0
>     '_seqid' => 'subjpool12_contig3'
>     '_start' => 911
>     '_strand' => '-1'
> 
>    Are you using the latest BioPerl (1.6.1 or the trunk) ?
>    MAJ
>    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>    Cc: <bioperl-l at lists.open-bio.org>
>    Sent: Wednesday, December 02, 2009 2:29 PM
>    Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>    Here is some of my code, the real code actually enters the data into a database.
> 
> 
>    $in  = Bio::SeqIO->new(-file => $gbkfile,
>         '-format' => 'genbank');
> 
>    W1:while (my $seq = $in->next_seq()) {
>    my @feats = $seq->get_all_SeqFeatures();
>    my $j = 0;
>    F1:foreach $cds (@feats) {
>    next F1 unless ($cds->primary_tag() eq 'CDS');
>    ###>> debugger stops here for above output
> 
>    #do something with the cds start and cds end
>    }
>    }
> 
> 
>    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
>    ACCESSION   subjpool12_contig3
>    KEYWORDS    .
>    SOURCE      human metagenome
>    ORGANISM  human metagenome
>              unclassified sequences; organismal metagenomes,metagenomes.
>    FEATURES             Location/Qualifiers
>       source          1..974
>                       /mol_type="genomic DNA"
>                       /isolation_source="Homo sapiens"
>                       /organism="human metagenome"
>                       /collection_date="19-Nov-2009"
>       CDS             complement(911..974)
>                       /locus_tag="subjpool12_contig3|metagene|gene_2"
>                       /translation="IRIMTVELINPYIRHVEHST"
>                       /score="2.52804"
>                       /product="hypothetical protein"
>                       /note="score=2.52804"
>                       /note="score=2.52804"
>                       /note="frame=1"
>    ORIGIN
>    #some sequence?.
> 
> 
> 
> 
> 
>      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> 
>    ~~~~~~~~~~~~~~~~~~~~
>    Brandi Cantarel, PhD
>    Bioinformatics Analyst
>    Institute for Genome Sciences
>    School of Medicine
>    University of Maryland, Baltimore
> 
>    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
> 
>      Hi Brandi-
> 
>      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> 
>      Can you elaborate by posting your code?
> 
>      cheers,
> 
>      MAJ
> 
>      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> 
>      To: <bioperl-l at lists.open-bio.org>
> 
>      Sent: Wednesday, December 02, 2009 1:36 PM
> 
>      Subject: [Bioperl-l] Parsing Genbank
> 
> 
> 
> 
> 
>        Hi all,
> 
>        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
> 
> 
> 
> 
> 
>        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
> 
> 
> 
>        x $cds->start
> 
>        1
> 
>        x $cds->end
> 
>        64
> 
> 
> 
>        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
> 
> 
> 
>        Feature or Bug?
> 
> 
> 
> 
> 
>        ~~~~~~~~~~~~~~~~~~~~
> 
>        Brandi Cantarel, PhD
> 
>        Bioinformatics Analyst
> 
>        Institute for Genome Sciences
> 
>        School of Medicine
> 
>        University of Maryland, Baltimore
> 
> 
> 
> 
> 
>        _______________________________________________
> 
>        Bioperl-l mailing list
> 
>        Bioperl-l at lists.open-bio.org
> 
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
>    _______________________________________________
>    Bioperl-l mailing list
>    Bioperl-l at lists.open-bio.org
>    http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Thu Dec  3 05:31:31 2009
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 Dec 2009 05:31:31 -0500
Subject: [Bioperl-l] modENCODE seeking data managers
Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com>

Hi All,

My apologies for spamming the list, but this announcement may be of
interest:


The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA
Elements; www.modencode.org) is seeking data managers to gather and curate
large scale functional genomics data sets in fly and worm. For details, see
http://blog.modencode.org/?p=350.


Lincoln

-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From dan.bolser at gmail.com  Thu Dec  3 06:44:40 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 11:44:40 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ?
Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>

Hi, can someone test the script here on zero length fasta / qual files?

http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ


It seems the output has an extra newline in the sequence part of the
output (which throws off scripts that rely on the 'four lines per
record' structure of the fastq (although I'm not sure if it's illegal
fastq).

Here is what I see

BEGIN
$ head one.fna
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ head one.qual
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ createFastq.plx one.fna one.qual
@FVF7ZWH02PFOVG


+FVF7ZWH02PFOVG

END


Currently I just put in a clause in the script to skip any zero length
sequences, but I think the Qual shouldn't output an extra newline like
this.


Cheers,
Dan.


--

JHB: Bioinformatics is Biology and Biology is Bioinformatics.


From biopython at maubp.freeserve.co.uk  Thu Dec  3 07:12:15 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 12:12:15 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>

On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi, can someone test the script here on zero length fasta / qual files?
>
> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>
> It seems the output has an extra newline in the sequence part of the
> output (which throws off scripts that rely on the 'four lines per
> record' structure of the fastq (although I'm not sure if it's illegal
> fastq).

Hi Dan,

The OBF consensus was FASTQ records with a zero length
sequence might be useful, and should be output as exactly
four lines (one blank sequence line, one blank quality line).
However for parsing, any number of blank lines should be OK.
http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html

I can confirm the perl script currently outputs a FASTQ file
with TWO blank lines for the sequence, giving five lines in
total for the zero length record. That does suggest a bug.
What version of BioPerl are you running?

Peter

P.S. The script is throwing away any description after the
identifier.


From dan.bolser at gmail.com  Thu Dec  3 08:07:27 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 13:07:27 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>> Hi, can someone test the script here on zero length fasta / qual files?
>>
>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>
>> It seems the output has an extra newline in the sequence part of the
>> output (which throws off scripts that rely on the 'four lines per
>> record' structure of the fastq (although I'm not sure if it's illegal
>> fastq).
>
> Hi Dan,
>
> The OBF consensus was FASTQ records with a zero length
> sequence might be useful, and should be output as exactly
> four lines (one blank sequence line, one blank quality line).
> However for parsing, any number of blank lines should be OK.
> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>
> I can confirm the perl script currently outputs a FASTQ file
> with TWO blank lines for the sequence, giving five lines in
> total for the zero length record. That does suggest a bug.
> What version of BioPerl are you running?

Hi Peter,

Basically, I'm not running the 'latest' version of BP, which is why I
asked this question of the list rather than filing a bug report. What
version are you running? ;-)

Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
for the info).


> Peter
>
> P.S. The script is throwing away any description after the
> identifier.

That's probably bad. Feel free to edit the script on the wiki. Sadly,
MediaWiki's diff features are less than optimal, so developing scripts
on the wiki isn't ideal. Anyone know how to plug git-hub into a script
apparently hosted on a wiki?

Or is git-hub basically designed to be 'wiki for code'?

I'm wondering, because with the FlaggedRevs extension you could
basically build a whole release in the wiki. Which would be fun if
nothing else!


-- 

JHP: Biology is bioinformatics and bioinformatics is biology.


From heyne at informatik.uni-freiburg.de  Thu Dec  3 08:19:51 2009
From: heyne at informatik.uni-freiburg.de (Steffen Heyne)
Date: Thu, 03 Dec 2009 14:19:51 +0100
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>
	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de>

Hello,

so I tried to fix the problem with the location. Currently it works for
me with the following changes:

LocatableSeq.pm

sub get_nse{

...

	my $ret;
	if ($self->strand() >= 0) {
		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
	} else {
		$ret = $id . $v. $char1 . $end . $char2 . $st ;
	}
	return $ret;
}

Then I recognized during the usage of $aln->remove_seq() that it cannot
remove a seq as it uses a wrong NSE to lookup sequences. I changed the
following:

SimpleAlign.pm

sub remove_seq {

...
	$id = $seq->id();
    	$start = $seq->start();
    	$end  = $seq->end();

## changed code:

	my $v = $seq->version ? '.'.$seq->version : '';
    	if ($seq->strand >=0){
		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
	} elsif ($seq->strand == -1){
		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
	}	
...

}

The above code in LocatableSeq.pm worked in the case if I read an
alignment in stockholm format and write it out in clustalw format. But
if I read an alignment in clustalw and write it out as stockholm (or
something else) it didn't worked, as the strand is not correctly set in
ClustalW::next_aln. It works with the following changes:

ClustalW.pm

sub next_aln{

...

	my ( $sname, $start, $end, $strand );	## strand added
	$strand = 0;				## new, standard = 0???
    	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
%alignments ) {
        if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
        	( $sname, $start, $end ) = ( $1, $2, $3 );
		$strand = 1;			## new			
		if ($start > $end) {		## new
       		($start, $end, $strand) = ($end, $start, -1); ##new
		}				## new
	
      }
        else {
            ( $sname, $start ) = ( $name, 1 );
            my $str = $alignments{$name};
            $str =~ s/[^A-Za-z]//g;
            $end = length($str);
        }

        my $seq = Bio::LocatableSeq->new(
            -seq   => $alignments{$name},
            -id    => $sname,
            -start => $start,
            -end   => $end,
	    -strand=> $strand			## new
        );

...

}

So I don't know if I changed things at their correct position. And I
found them only because I used certain functions. I dont know how broad
the effect of a changed NSE in LocatableSeq.pm is to other Modules and
functions. But I'm happy with my changes (so far :-)...).

Do you will change this to your proposed way in bioperl trunk?

Thanks!

steffen


Chris Fields schrieb:
> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
> 
>> Hi,
>>
>> I'm using Bioperl for my research and it is very useful! Thank you!
>>
>> Currently I have a problem with locations tags of sequences. I read in
>> seed alignments of Rfam (in stockholm format, but I think it is
>> similar to other formats).
>>
>> If the location is like:
>>
>> AB194432.1/908-846
>>
>> the start/end values are changed to
>>
>> $seq->start = 846
>> $seq->end = 908
>>
>> and therefore the new location (e.g.$seq->get_nse) is:
>>
>> AB194432.1/846-908
>>
>> The $seq->strand tag is correctly set to -1 in this case, but if the
>> alignment is written out again (clustal, stockholm,...) this strand
>> info is lost and the sequences have this "wrong" location. But this
>> information is important in respect to the sequence accession number.
>>
>> Is there a way to set the location back to the original one or is this
>> behavior desired? Any manually setting with $seq->start($val) failed
>> due to automatic checking.
>>
>> I'm using bioperl 1.6.1
>>
>> Thanks!
>>
>> steffen
> 
> This is a definite bug. We recently discussed amending the NSE format
> due to this (the subject came up over the last few months or so); it's
> fallen through the cracks.  Fortunaely it is very easy to fix (the
> relevant method is in LocatableSeq).
> 
> Does anyone have a problem with me adding this in?  It will change
> output for only those instances where the strand is -1, so
> 
> AB194432.1/908-846
> 
> would be start = 846, end = 908, strand = -1
> 
> AB194432.1/846-908
> 
> would be start = 846, end = 908, strand = 1
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
---
Steffen Heyne, Dipl.-Bioinf.
Lehrstuhl f?r Bioinformatik
Institut f?r Informatik
Albert-Ludwigs-Universit?t Freiburg
Georges-K?hler-Allee 106
79110 Freiburg, Germany

Tel: (+49) 761 203 7465
Fax: (+49) 761 203 7462
Mail: heyne at informatik.uni-freiburg.de


From cjfields at illinois.edu  Thu Dec  3 08:47:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 07:47:32 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>

Dan,

On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

> 2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
>> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>>> Hi, can someone test the script here on zero length fasta / qual files?
>>> 
>>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>> 
>>> It seems the output has an extra newline in the sequence part of the
>>> output (which throws off scripts that rely on the 'four lines per
>>> record' structure of the fastq (although I'm not sure if it's illegal
>>> fastq).
>> 
>> Hi Dan,
>> 
>> The OBF consensus was FASTQ records with a zero length
>> sequence might be useful, and should be output as exactly
>> four lines (one blank sequence line, one blank quality line).
>> However for parsing, any number of blank lines should be OK.
>> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>> 
>> I can confirm the perl script currently outputs a FASTQ file
>> with TWO blank lines for the sequence, giving five lines in
>> total for the zero length record. That does suggest a bug.
>> What version of BioPerl are you running?
> 
> Hi Peter,
> 
> Basically, I'm not running the 'latest' version of BP, which is why I
> asked this question of the list rather than filing a bug report. What
> version are you running? ;-)
> 
> Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
> for the info).

FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN).  Basically, it now parses all three FASTQ variants.  However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1.  Peter can you confirm that?

>> Peter
>> 
>> P.S. The script is throwing away any description after the
>> identifier.
> 
> That's probably bad. Feel free to edit the script on the wiki. Sadly,
> MediaWiki's diff features are less than optimal, so developing scripts
> on the wiki isn't ideal. Anyone know how to plug git-hub into a script
> apparently hosted on a wiki?
> 
> Or is git-hub basically designed to be 'wiki for code'?

It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc.  Think Soourceforge, but a lot nicer and with no ads ;>

BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric).

> I'm wondering, because with the FlaggedRevs extension you could
> basically build a whole release in the wiki. Which would be fun if
> nothing else!

I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (

chris


From biopython at maubp.freeserve.co.uk  Thu Dec  3 09:20:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:20:32 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>

On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> FASTQ parsing had undergone a major revision prior to
> 1.6.1 (the latest release in CPAN). ?Basically, it now parses
> all three FASTQ variants. ?However, Peter indicates there
> may still be a problem, and it's likely he's running 1.6.1.
> Peter can you confirm that?

I had BioPerl from SVN circa 1.6.1 (not sure if this was before
or after the release of 1.6.1 now):

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069
$ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
1.0069

If the tuples mean anything to you:

$ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
49.46.48.48.54.57
$ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
49.46.48.48.54.57

I just updated to revision 16435, and retested. I get the same
BioPerl version numbers, and the same extra blank line in the
sequence FASTQ output as Dan reported.

Peter


From cjfields at illinois.edu  Thu Dec  3 09:39:35 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 08:39:35 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
Message-ID: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>

On Dec 3, 2009, at 8:20 AM, Peter wrote:

> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> 
>> FASTQ parsing had undergone a major revision prior to
>> 1.6.1 (the latest release in CPAN).  Basically, it now parses
>> all three FASTQ variants.  However, Peter indicates there
>> may still be a problem, and it's likely he's running 1.6.1.
>> Peter can you confirm that?
> 
> I had BioPerl from SVN circa 1.6.1 (not sure if this was before
> or after the release of 1.6.1 now):
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
> 1.0069
> 
> If the tuples mean anything to you:
> 
> $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 49.46.48.48.54.57
> $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
> 49.46.48.48.54.57
> 
> I just updated to revision 16435, and retested. I get the same
> BioPerl version numbers, and the same extra blank line in the
> sequence FASTQ output as Dan reported.
> 
> Peter

Okay I will try to look into it today (it should be an easy fix).  There are two issues, correct?

1) extra blank line.
2) missing description

Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)?  Otherwise it might get lost on the mail list or wiki.

chris


From biopython at maubp.freeserve.co.uk  Thu Dec  3 09:56:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:56:39 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>

On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?
>
> 1) extra blank line.

Which seems to be a bug in BioPerl SeqIO itself.

> 2) missing description

This is just a trivial bug/omission in the wiki example,
http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ

You just need to replace this:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

With:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -description => $seq_obj->description,
             -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

Look - I seem to be learning Perl by osmosis ;)

Peter


From dan.bolser at gmail.com  Thu Dec  3 11:29:11 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:29:11 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
	<320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?

...

>> 2) missing description
>
> This is just a trivial bug/omission in the wiki example,

...

> Look - I seem to be learning Perl by osmosis ;)

Yay!


From dan.bolser at gmail.com  Thu Dec  3 11:30:44 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:30:44 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>

2009/12/3 Chris Fields <cjfields at illinois.edu>:
> Dan,
>
> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

...

>> I'm wondering, because with the FlaggedRevs extension you could
>> basically build a whole release in the wiki. Which would be fun if
>> nothing else!
>
> I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see (

I never said it would be beneficial, only that it would be fun.

http://www.mediawiki.org/wiki/Flaggedrevs


From florent.angly at gmail.com  Thu Dec  3 13:26:57 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 03 Dec 2009 10:26:57 -0800
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
	<4B17BAF7.2050604@informatik.uni-freiburg.de>
Message-ID: <4B1802F1.1040304@gmail.com>

Hi all,

Like Steffen, I've had a few burning questions too regarding 
LocatableSeq lately.

I've had an occasional issue with LocatableSeq. Most assembly-related 
modules use LocatableSeq objects. They specify the sequence start but 
not the sequence end. This works in most cases, but I've recently 
encountered very occasional error messages related to having not 
explicitely set the end of the sequence. I've been unable to put 
together a small test case to reproduce the bug easily.

My question is. If the start of the sequence is set, is it mandatory to 
set the end of the sequence? If so, then maybe the documentation needs 
to be explicit about it and maybe there needs to be a check that 
enforces that the end is set. In fact, it seems like if I provide a 
sequence and its start position, the LocatableSeq code should be able to 
automatically calculate its end, no?

Florent


Steffen Heyne wrote:
> Hello,
>
> so I tried to fix the problem with the location. Currently it works for
> me with the following changes:
>
> LocatableSeq.pm
>
> sub get_nse{
>
> ...
>
> 	my $ret;
> 	if ($self->strand() >= 0) {
> 		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
> 	} else {
> 		$ret = $id . $v. $char1 . $end . $char2 . $st ;
> 	}
> 	return $ret;
> }
>
> Then I recognized during the usage of $aln->remove_seq() that it cannot
> remove a seq as it uses a wrong NSE to lookup sequences. I changed the
> following:
>
> SimpleAlign.pm
>
> sub remove_seq {
>
> ...
> 	$id = $seq->id();
>     	$start = $seq->start();
>     	$end  = $seq->end();
>
> ## changed code:
>
> 	my $v = $seq->version ? '.'.$seq->version : '';
>     	if ($seq->strand >=0){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
> 	} elsif ($seq->strand == -1){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
> 	}	
> ...
>
> }
>
> The above code in LocatableSeq.pm worked in the case if I read an
> alignment in stockholm format and write it out in clustalw format. But
> if I read an alignment in clustalw and write it out as stockholm (or
> something else) it didn't worked, as the strand is not correctly set in
> ClustalW::next_aln. It works with the following changes:
>
> ClustalW.pm
>
> sub next_aln{
>
> ...
>
> 	my ( $sname, $start, $end, $strand );	## strand added
> 	$strand = 0;				## new, standard = 0???
>     	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
> %alignments ) {
>         if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
>         	( $sname, $start, $end ) = ( $1, $2, $3 );
> 		$strand = 1;			## new			
> 		if ($start > $end) {		## new
>        		($start, $end, $strand) = ($end, $start, -1); ##new
> 		}				## new
> 	
>       }
>         else {
>             ( $sname, $start ) = ( $name, 1 );
>             my $str = $alignments{$name};
>             $str =~ s/[^A-Za-z]//g;
>             $end = length($str);
>         }
>
>         my $seq = Bio::LocatableSeq->new(
>             -seq   => $alignments{$name},
>             -id    => $sname,
>             -start => $start,
>             -end   => $end,
> 	    -strand=> $strand			## new
>         );
>
> ...
>
> }
>
> So I don't know if I changed things at their correct position. And I
> found them only because I used certain functions. I dont know how broad
> the effect of a changed NSE in LocatableSeq.pm is to other Modules and
> functions. But I'm happy with my changes (so far :-)...).
>
> Do you will change this to your proposed way in bioperl trunk?
>
> Thanks!
>
> steffen
>
>
> Chris Fields schrieb:
>   
>> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
>>
>>     
>>> Hi,
>>>
>>> I'm using Bioperl for my research and it is very useful! Thank you!
>>>
>>> Currently I have a problem with locations tags of sequences. I read in
>>> seed alignments of Rfam (in stockholm format, but I think it is
>>> similar to other formats).
>>>
>>> If the location is like:
>>>
>>> AB194432.1/908-846
>>>
>>> the start/end values are changed to
>>>
>>> $seq->start = 846
>>> $seq->end = 908
>>>
>>> and therefore the new location (e.g.$seq->get_nse) is:
>>>
>>> AB194432.1/846-908
>>>
>>> The $seq->strand tag is correctly set to -1 in this case, but if the
>>> alignment is written out again (clustal, stockholm,...) this strand
>>> info is lost and the sequences have this "wrong" location. But this
>>> information is important in respect to the sequence accession number.
>>>
>>> Is there a way to set the location back to the original one or is this
>>> behavior desired? Any manually setting with $seq->start($val) failed
>>> due to automatic checking.
>>>
>>> I'm using bioperl 1.6.1
>>>
>>> Thanks!
>>>
>>> steffen
>>>       
>> This is a definite bug. We recently discussed amending the NSE format
>> due to this (the subject came up over the last few months or so); it's
>> fallen through the cracks.  Fortunaely it is very easy to fix (the
>> relevant method is in LocatableSeq).
>>
>> Does anyone have a problem with me adding this in?  It will change
>> output for only those instances where the strand is -1, so
>>
>> AB194432.1/908-846
>>
>> would be start = 846, end = 908, strand = -1
>>
>> AB194432.1/846-908
>>
>> would be start = 846, end = 908, strand = 1
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at illinois.edu  Thu Dec  3 23:16:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 22:16:48 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu>


On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote:

> 2009/12/3 Chris Fields <cjfields at illinois.edu>:
>> Dan,
>> 
>> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:
> 
> ...
> 
>>> I'm wondering, because with the FlaggedRevs extension you could
>>> basically build a whole release in the wiki. Which would be fun if
>>> nothing else!
>> 
>> I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (
> 
> I never said it would be beneficial, only that it would be fun.
> 
> http://www.mediawiki.org/wiki/Flaggedrevs

Ah, okay, that makes some sense.  

Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue.

chris


From rtbio.2009 at gmail.com  Fri Dec  4 08:57:21 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 4 Dec 2009 14:57:21 +0100
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
Message-ID: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>

Hello all,

I am working on Remote blast.Here,I am trying to get 2 parameters into the
remote blast code.They are

1.The input sequence that has to be sent to blast

2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
etc.,)

When I tried to take the organism parameter as an input from the
user,through a web page,the Remote blast was not giving any results i.e., it
says that there are no alignments found.

But,when I hard coded the organism in the code,it gives me the results i.e.,
3hits.

I could not understand this problem.Could any body please help me in this
regard?

My code is

sub blastcode
{

$input1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
               print OUTFILE @params;
              close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-Organism' => $organism );

while (my $input = $str->next_seq())

{
#Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

   # my $r = $factory->submit_blast('amino.fa');

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

      #    open(BLASTDEBUGFILE,'>',$debugfile);
       #   print BLASTDEBUGFILE $result->next_hit();
        #  close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);
$factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);
}

Regards,
Roopa.


From cjfields at illinois.edu  Fri Dec  4 09:59:17 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 4 Dec 2009 08:59:17 -0600
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
In-Reply-To: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
References: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu>

Roopa,

At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports).  See here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155

Also, are the returned hits specific for the genome?  You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why):

http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi

chris 
 
On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I am working on Remote blast.Here,I am trying to get 2 parameters into the
> remote blast code.They are
> 
> 1.The input sequence that has to be sent to blast
> 
> 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
> etc.,)
> 
> When I tried to take the organism parameter as an input from the
> user,through a web page,the Remote blast was not giving any results i.e., it
> says that there are no alignments found.
> 
> But,when I hard coded the organism in the code,it gives me the results i.e.,
> 3hits.
> 
> I could not understand this problem.Could any body please help me in this
> regard?
> 
> My code is
> 
> sub blastcode
> {
> 
> $input1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $input1;
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE @params;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
> '-Organism' => $organism );
> 
> while (my $input = $str->next_seq())
> 
> {
> #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>   my $r = $factory->submit_blast($input);
> 
>   # my $r = $factory->submit_blast('amino.fa');
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
> 
>     foreach my $rid ( @rids ) {
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>      #    open(BLASTDEBUGFILE,'>',$debugfile);
>       #   print BLASTDEBUGFILE $result->next_hit();
>        #  close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> $factory->save_output($filename);
> 
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> }
> 
> Regards,
> Roopa.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Fri Dec  4 13:27:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Fri, 4 Dec 2009 13:27:38 -0500
Subject: [Bioperl-l] Gene critical region analysis -- visual display
Message-ID: <deaa866a0912041027r71c49f58n7d467f050c2f49c6@mail.gmail.com>

Background:
I have been involved in aging research off and on for ~16 years.  My initial
focus was in the eventual decline of the "program" (because DNA has no ECC
and only limited redundancy) therefore my initial work (in the early 1990's
was focused on DNA repair genes (of which there about 150 in the human
genome) [1,2].  Most recently I have focused in on the DNA double strand
break repair processes (NHEJ) as a fundamental cause of aging because it may
fundamentally corrupt the genomes of individual cells.  (And as most
programmers would agree -- break the code and you break the program).
 Michael Lieber at UCLA has estimated that by the time a human is ~70 on the
order of several hundred genes in ones cells have been corrupted (which may
be an
indeterminate effect on the cells functioning).

Problem:
Just looking at the GenBank output for the human Artemis (DCLRE1C) gene
there are on the order of 18 SNPs and 8 possible phosphorylation sites (not
to mention other potential modification sites) -- this combined with the
fact that Methionine and Tryptophan and to a lesser extent Cysteine are more
susceptible to single base mutations (due the alteration of the codon->amino
acid coding even involving single base mutations/repairs) . There are
various programs to analyze such proteins for the critical sites -- SIFT and
the various programs pointed to by their sites.  Now it seems to me that one
could attack this problem by integrating SNPs, mutations, etc. at the
critical sites (where "critical" may or may not be at normal SNPs -- which
presumably are primarily at non-critical sites -- and those proteins where
if you change the coding sequence to non-synomonous amino acids you
potentially break the protein (the real interpretation of which will not be
understood until population studies are done).

So, in the process of looking at the DCLRE1C protein I asked myself, "Why is
there not a BioPerl function which simply enables a visual interpretation of
the critical sites of the protein?"  I.e. some color-coded representation of
the protein (which presumably has some augmented functionality to determine
things like probability or statistical information).  I.e. hand the function
a .fasta file and it will give you an visual (colored) analysis of the
critical nature of specific a.a. -- i.e. something which could be used by
genomic or SNP analysis (such as I presume that being done by 23andme -- as
well as other organizations) to begin to separate out the variations in the
human genome (e.g. SNPs) from the mutations which may effect individuals.

I have the C programming and to a lesser extent Perl experience to
contribute to this -- I lack the BioPerl wisdom to make it generally
available.

If anyone has some suggestions as to what functions/modules might be of use
(in providing a "single-look" view of gene a.a. whose mutations may be more
or less detrimental) I would appreciate hearing from them.

Robert Bradbury

1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press
(2006)
2. "Aging of the Genome",  J. Vijg, Oxford University Press (2007)


From maj at fortinbras.us  Sun Dec  6 17:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 6 Dec 2009 17:54:00 -0500
Subject: [Bioperl-l] bioperl-mode new feature: base class browsing
Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife>

Hi All, 
You can now browse pod of the base/parent classes of bioperl modules
with one keystroke using the latest update of bioperl-mode.
See http://bioperl.org/wiki/Emacs_bioperl-mode
Press "B" or "P" while in pod view to get a completion list 
of the parent classes for the module whose pod you're viewing.
cheers, 
MAJ


From mmokrejs at ribosome.natur.cuni.cz  Mon Dec  7 15:33:48 2009
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Mon, 07 Dec 2009 21:33:48 +0100
Subject: [Bioperl-l] Generalized reciprocal blast
In-Reply-To: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
References: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz>

Hi,
  I just stumbled across this older posting ... maybe you want to exploit
SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has
remote API available.
Martin

Robert Bradbury wrote:
> I would like to know whether or not anyone has attempted to create a
> "generalized" reciprocal blast component for BioPerl?
> 
> One sees papers all the time where they discuss running reciprocal blasts to
> compare a new species to an old "standard" species or a set of species or
> running an all-to-all set of comparisons to match up all of the "known"
> proteins from species and determine which are outliers (and therefore
> "novel").  There are also accumulating merged sets in NCBI HomoloGene (which
> seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes)
> and Ensembl (which seems to be working with a much larger set of 40-50
> genomes some of which may be somewhat incomplete and are certainly poorly
> "explored".
> 
> I have, I believe, seen code "fragments" from various authors, perhaps some
> on the BioPerl list, which perform some major subset of a typical
> "reciprocal blast".
> 
> Now what I am looking for is a relatively generalizable some-to-some
> reciprocal blast utility.  I want to be able to specify the genes (or gene
> family), e.g. some of the ~150 known DNA repair genes.  It would be helpful
> to also specify how "tolerant" the blast "true reciprocal" criteria are.
> There are some genes where there is a very strict 1-to-1 relationship across
> many genomes.  But for genes which involve relatively standard domains, e.g.
> "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for
> example its more like 5-to-5 and it would be really nice to be able to
> specify the strictness or quality level [1] for "matching" genes (and even
> which genes are to be excluded because they are known to be false
> homologues).
> 
> Then to top this off I want to be able to combine known public e.g.
> (HomoloGene / Uniigene / Ensembl) databases with perhaps local private
> databases or database subsets (e.g. emerging or specialized genomes).
> 
> The goal here of course to determine the precise phylogenetic relationships
> between all of the DNA repair genes and how there may be gain / loss /
> evolution of function that can be related to species characteristics (size,
> longevity, etc.).
> 
> Is there a generalized reciprocal blast component in BioPerl?  Or is it a
> "build-it-yourself" situation (that I have to believe has been built
> probably a few dozen times by various researchers / organizations /
> companies)?
> 
> Thanks,
> Robert Bradbury
> 
> 1. This would be handled in BioPerl with a customizable user function which
> could be tailored to handle specific cases -- for example a function which
> when handed a set of 100 potential "matches" could go through those 100
> matches, identify common domains, and then "re-rate" matches based on
> considerations such as the type and number of common domains, domains being
> in the same order, etc.  I.e. criteria which may be difficult to completely
> generalize across entire genomes but are fairly obvious if you are looking
> at a graphical replication of a gene set in HomoloGene.


From robert.bradbury at gmail.com  Mon Dec  7 15:41:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 7 Dec 2009 15:41:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions
Message-ID: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>

This comment could also have a subject line: "Why does Bioperl/get_sequence>
fork at all!  Why are not all operations sequential?  And if this is a
"default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
script if I have little or no capability of what the program uses when it
runs?  I may have days so I can bear the burden of relatively slow results
(and so can use sequential processing rather than parallel).

I've got a perl script that uses remote blast to blast a sequence against a
subset of the NCBI sequences.  It "mostly" works, in that it returns a
seemingly complete .bls result file but when attempting to look at the
sequences (so it can more accurately summarize the information from the
results than a standard blast report allows) it terminates prematurely with
errors.

The error is:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Couldn't fork: Resource temporarily unavailable
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::WebDBSeqI::_open_pipe
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
STACK: Bio::Perl::get_sequence
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
STACK: /home/bradbury/Genomes/bin/RB.pl:155
-----------------------------------------------------------

The precise line (in my code) whcih appears to be generating the error is:
    $seq = get_sequence('GenBank', $accsn);

Now this can be a problem if NCBI/Genbank fails due to load conditions --
but this specific failure (which is repeatable is due to most likely hitting
the user process limit restrictions) -- but the small blast results work
fine -- its only if the Blast has returned several hundred hits that it runs
into this problem.

Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
queries (to get a sequence) with complete disregard of the environment
(process limits, NCBI limits, etc.).  But I do not know enough about how
this works to point a finger at some specific function.  As a result
get_sequence process results are accumulated, summarized, etc. without ever
having issued to respect "wait-variant()) calls to collect former children
[This IMO would clearly be a bug.]

It could be adjusted to by allowing the BioPerl library to run in 3 modes.
 (1) completely synchronous -- if you fork you wait until its done -- and
you collect "it" and any fork fails then one either collects the process or
switches to the non-conservative mode.

Robert


From cjfields at illinois.edu  Mon Dec  7 16:08:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 7 Dec 2009 15:08:40 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <A36A88C9-D94C-4559-A629-56EB8F374DAC@illinois.edu>

Robert, 

If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not.  All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly.

See the POD for those specific modules for more information.

chris

On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
> script if I have little or no capability of what the program uses when it
> runs?  I may have days so I can bear the burden of relatively slow results
> (and so can use sequential processing rather than parallel).
> 
> I've got a perl script that uses remote blast to blast a sequence against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from the
> results than a standard blast report allows) it terminates prematurely with
> errors.
> 
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
> 
> The precise line (in my code) whcih appears to be generating the error is:
>    $seq = get_sequence('GenBank', $accsn);
> 
> Now this can be a problem if NCBI/Genbank fails due to load conditions --
> but this specific failure (which is repeatable is due to most likely hitting
> the user process limit restrictions) -- but the small blast results work
> fine -- its only if the Blast has returned several hundred hits that it runs
> into this problem.
> 
> Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc. without ever
> having issued to respect "wait-variant()) calls to collect former children
> [This IMO would clearly be a bug.]
> 
> It could be adjusted to by allowing the BioPerl library to run in 3 modes.
> (1) completely synchronous -- if you fork you wait until its done -- and
> you collect "it" and any fork fails then one either collects the process or
> switches to the non-conservative mode.
> 
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec  7 16:24:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Dec 2009 13:24:54 -0800
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>

Robert -

You seem to be mixing the blast remote and the sequence query  
retrieval problems. These messages are related to the remote retrieval  
of sequences.
  It is hard to tell from your message specifically which modules you  
are using or how you are querying NCBI - there are several ways to do  
this either with the NCBI tools or the Bio::DB::GenBank.
  If you are using Bio::DB::Query::GenBank that allows for async  
access and has built in controls to adhere to the wait variant that  
NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method  
does any sort of thing (at least when it was originally written).

I always advocate if you want highly available and reliable access to  
sequences you should download the nr or whichever DB and use the local  
indexing tools for the retrieval.  Once you start doing hundreds of  
queries I don't see any good reason to be doing the query against NCBI  
directly given unreliabilities of the web and services. Local  
databases are faster and more reliable for most people so I urge you  
take advantage of the tools which provide local database access with  
the same APIs.


I would like to comment that the tone of your posts to the list are  
not particularly helpful.   I wonder if you are actually asking for  
help or just interested in complaining about when things don't work as  
you expect? This is a collaborative and volunteer-only project, with  
the principles of working together to make useful toolkit.  We  
encourage you to build programs and applications from this base that  
suit your needs, but not all things will be directly implemented in  
the toolkit if they aren't generic enough (at least that is my  
feeling, the other Core devs help with these decisions).
   If there is a useful, generic, and reusable part we would like that  
to be part of the API. Otherwise we suggest the new application that  
fits a developer's vision. We encourage you to write (and publish)  
that application separately, but certainly encourage bug (and fixes)  
submissions and also code contributions for new features where they  
can be seen as generally useful.

-jason
On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/ 
> get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable  
> BioPerl
> script if I have little or no capability of what the program uses  
> when it
> runs?  I may have days so I can bear the burden of relatively slow  
> results
> (and so can use sequential processing rather than parallel).
>
> I've got a perl script that uses remote blast to blast a sequence  
> against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from  
> the
> results than a standard blast report allows) it terminates  
> prematurely with
> errors.
>
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
>
> The precise line (in my code) whcih appears to be generating the  
> error is:
>    $seq = get_sequence('GenBank', $accsn);
>
> Now this can be a problem if NCBI/Genbank fails due to load  
> conditions --
> but this specific failure (which is repeatable is due to most likely  
> hitting
> the user process limit restrictions) -- but the small blast results  
> work
> fine -- its only if the Blast has returned several hundred hits that  
> it runs
> into this problem.
>
> Now what it sounds like to me is an attempt to do multiple  
> asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about  
> how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc.  
> without ever
> having issued to respect "wait-variant()) calls to collect former  
> children
> [This IMO would clearly be a bug.]
>
> It could be adjusted to by allowing the BioPerl library to run in 3  
> modes.
> (1) completely synchronous -- if you fork you wait until its done --  
> and
> you collect "it" and any fork fails then one either collects the  
> process or
> switches to the non-conservative mode.
>
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From Jonas_Schaer at gmx.de  Tue Dec  8 10:21:58 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Tue, 8 Dec 2009 16:21:58 +0100
Subject: [Bioperl-l] fasta format
Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>

Hi there,
I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!).

------------- EXCEPTION -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #49 '
..' is 28 != 101 chars.
STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771
STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
STACK main::readfasta blast_eval.pm:174
STACK toplevel blast_eval.pm:83
-------------------------------------

indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/
DB/Fasta.pm line 1054.


Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ?

Thanks in advance for any help! 

Regards, Jonas


From awitney at sgul.ac.uk  Tue Dec  8 12:01:58 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 8 Dec 2009 17:01:58 +0000
Subject: [Bioperl-l] package to associate genes with branches on trees?
Message-ID: <DB3D347F-EB9E-4A59-87D2-3E1A5FACF154@sgul.ac.uk>

Hi,

I have been generating some trees with Phylip (pars) and then  
processing them with Bioperl. These trees are generated by comparing  
multiple strains of a bacterial organism by presence/absence (0/1)  
calls for each gene.

I was wondering of there was any package in Bioperl to try to  
determine if any specific genes were associated with specific branches  
of the trees? Or if anyone knew of another tool that can do this?

thanks for any help

adam


From jason at bioperl.org  Tue Dec  8 12:44:43 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 8 Dec 2009 09:44:43 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>

you can run
sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or  
that is installed when you install the Bioperl scripts)
$ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa
# rename it back
$ mv yournewfile.fa yourfile.fa

or
$ sreformat fasta yourfile.fa > yournewfile.fa
$ mv yournewfile.fa yourfile.fa


-jason
On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:

> Hi there,
> I have a little question concerning bioperl. I have  
> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read  
> in some fasta files. first it worked fine, but now i have some  
> fastafiles in slightly different format (not all lines have the same  
> length!).
>
> ------------- EXCEPTION -------------
> MSG: Each line of the fasta entry must be the same length except the  
> last.
>    Line above #49 '
> ..' is 28 != 101 chars.
> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ 
> Fasta.pm:771
> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
> STACK main::readfasta blast_eval.pm:174
> STACK toplevel blast_eval.pm:83
> -------------------------------------
>
> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ 
> site/lib/Bio/
> DB/Fasta.pm line 1054.
>
>
> Is there any way to use these fasta files with diffrent length of  
> lines with this fasta.pm module or will i have to change the format  
> of my fasta-files(big databases...) ?
>
> Thanks in advance for any help!
>
> Regards, Jonas
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Tue Dec  8 23:30:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 8 Dec 2009 22:30:26 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference
Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu>

All,

For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego.  This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13.  The exact day and time is somewhat flexible depending on attendees' schedules.

For those interested, sign up here:

http://www.bioperl.org/wiki/GMOD_2010_Meeting

For those interested in attending the GMOD meeting or PAG:

http://gmod.org/wiki/January_2010_GMOD_Meeting

I can envision the following items popping up:

* Refactoring of Alignment and GFF3/FeatureIO
* Addressing BioPerl's monolithic nature
* Moose and Perl 6
* Documentation

Any others?

chris


From akarger at CGR.Harvard.edu  Wed Dec  9 10:01:45 2009
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 9 Dec 2009 10:01:45 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>

> Is there any way to use these fasta files with diffrent length of
> lines with this fasta.pm module or will i have to change the format
> of my fasta-files(big databases...) ?
> 

Jonas,

It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back.

To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy).

The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like).

Let me know if you have a problem.

-Amir Karger
Life Sciences Research Computing, FAS IT
Harvard University


From Kevin.M.Brown at asu.edu  Wed Dec  9 10:26:22 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 9 Dec 2009 08:26:22 -0700
Subject: [Bioperl-l] fasta format
In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>

Even easier to accomplish in one step. Read in the fasta file and output
it right to another fasta file with SeqIO

my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
while (my $seq = $in->next){$out->write_seq($seq);}

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, December 09, 2009 8:02 AM
> To: Jonas Schaer; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> > Is there any way to use these fasta files with diffrent length of
> > lines with this fasta.pm module or will i have to change the format
> > of my fasta-files(big databases...) ?
> > 
> 
> Jonas,
> 
> It's not Bioperl, but for a quick fix you can use the 
> Scriptome. Use the change_fasta_to_tab script 
> (http://sysbio.harvard.edu/csb/resources/computational/scripto
> me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> format__change_fasta_to_tab_) to change your FASTA into a 
> tab-delimited file. Then use the next tool 
> (change_tab_to_fasta) to change your files back.
> 
> To use a tool: change the input and output file names on the 
> website, then cut and paste the Perl script from the green 
> box into a CMD window. The script works one sequence at a 
> time, so it doesn't need a lot of memory. (As long as you 
> have enough disk space to store the tab-delimited copy).
> 
> The recreated FASTAs will be 60 characters per line (although 
> you can hand-edit the line after you paste it to be whatever 
> number of characters you'd like).
> 
> Let me know if you have a problem.
> 
> -Amir Karger
> Life Sciences Research Computing, FAS IT
> Harvard University
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Russell.Smithies at agresearch.co.nz  Wed Dec  9 14:44:41 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 10 Dec 2009 08:44:41 +1300
Subject: [Bioperl-l] fasta format
In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
	<1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>

It's even easier as the script is already written for you :-)

bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa


--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
> Sent: Thursday, 10 December 2009 4:26 a.m.
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> Even easier to accomplish in one step. Read in the fasta file and output
> it right to another fasta file with SeqIO
> 
> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
> while (my $seq = $in->next){$out->write_seq($seq);}
> 
> Kevin Brown
> Center for Innovations in Medicine
> Biodesign Institute
> Arizona State University
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> > Sent: Wednesday, December 09, 2009 8:02 AM
> > To: Jonas Schaer; bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] fasta format
> >
> > > Is there any way to use these fasta files with diffrent length of
> > > lines with this fasta.pm module or will i have to change the format
> > > of my fasta-files(big databases...) ?
> > >
> >
> > Jonas,
> >
> > It's not Bioperl, but for a quick fix you can use the
> > Scriptome. Use the change_fasta_to_tab script
> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> > format__change_fasta_to_tab_) to change your FASTA into a
> > tab-delimited file. Then use the next tool
> > (change_tab_to_fasta) to change your files back.
> >
> > To use a tool: change the input and output file names on the
> > website, then cut and paste the Perl script from the green
> > box into a CMD window. The script works one sequence at a
> > time, so it doesn't need a lot of memory. (As long as you
> > have enough disk space to store the tab-delimited copy).
> >
> > The recreated FASTAs will be 60 characters per line (although
> > you can hand-edit the line after you paste it to be whatever
> > number of characters you'd like).
> >
> > Let me know if you have a problem.
> >
> > -Amir Karger
> > Life Sciences Research Computing, FAS IT
> > Harvard University
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Wed Dec  9 15:18:08 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 9 Dec 2009 15:18:08 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
	<18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife>

$ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, 
">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas

----- Original Message ----- 
From: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
To: "'Kevin Brown'" <Kevin.M.Brown at asu.edu>; <bioperl-l at bioperl.org>
Sent: Wednesday, December 09, 2009 2:44 PM
Subject: Re: [Bioperl-l] fasta format


> It's even easier as the script is already written for you :-)
>
> bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa
>
>
> --Russell
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
>> Sent: Thursday, 10 December 2009 4:26 a.m.
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] fasta format
>>
>> Even easier to accomplish in one step. Read in the fasta file and output
>> it right to another fasta file with SeqIO
>>
>> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
>> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
>> while (my $seq = $in->next){$out->write_seq($seq);}
>>
>> Kevin Brown
>> Center for Innovations in Medicine
>> Biodesign Institute
>> Arizona State University
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
>> > Sent: Wednesday, December 09, 2009 8:02 AM
>> > To: Jonas Schaer; bioperl-l at bioperl.org
>> > Subject: Re: [Bioperl-l] fasta format
>> >
>> > > Is there any way to use these fasta files with diffrent length of
>> > > lines with this fasta.pm module or will i have to change the format
>> > > of my fasta-files(big databases...) ?
>> > >
>> >
>> > Jonas,
>> >
>> > It's not Bioperl, but for a quick fix you can use the
>> > Scriptome. Use the change_fasta_to_tab script
>> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
>> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
>> > format__change_fasta_to_tab_) to change your FASTA into a
>> > tab-delimited file. Then use the next tool
>> > (change_tab_to_fasta) to change your files back.
>> >
>> > To use a tool: change the input and output file names on the
>> > website, then cut and paste the Perl script from the green
>> > box into a CMD window. The script works one sequence at a
>> > time, so it doesn't need a lot of memory. (As long as you
>> > have enough disk space to store the tab-delimited copy).
>> >
>> > The recreated FASTAs will be 60 characters per line (although
>> > you can hand-edit the line after you paste it to be whatever
>> > number of characters you'd like).
>> >
>> > Let me know if you have a problem.
>> >
>> > -Amir Karger
>> > Life Sciences Research Computing, FAS IT
>> > Harvard University
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kellert at ohsu.edu  Wed Dec  9 19:36:13 2009
From: kellert at ohsu.edu (Tom Keller)
Date: Wed, 9 Dec 2009 16:36:13 -0800
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>

Greetings,
Is there a simple way to map a list of ensembl ids to the NCBI gis?

thanks,
Tom

Thomas (Tom) Keller
kellert at ohsu.edu
503.494.2442
6339b R Jones Hall (BSc/CROET)
www.ohsu.edu/xd/research/research-cores/dna-analysis/


From cjfields at illinois.edu  Wed Dec  9 20:59:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 9 Dec 2009 19:59:37 -0600
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu>

Tom,

Probably best to do this via BioMart:

http://www.ensembl.org/biomart/

I would assume you can also do this via the ensembl perl API as well.

Also, have a look at the UniProt ID Mapper:

http://www.uniprot.org/?tab=mapping

chris

On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:

> Greetings,
> Is there a simple way to map a list of ensembl ids to the NCBI gis?
> 
> thanks,
> Tom
> 
> Thomas (Tom) Keller
> kellert at ohsu.edu
> 503.494.2442
> 6339b R Jones Hall (BSc/CROET)
> www.ohsu.edu/xd/research/research-cores/dna-analysis/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lovebaby39 at gmail.com  Thu Dec 10 09:22:14 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 22:22:14 +0800
Subject: [Bioperl-l] about bioperl issue
Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC>

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS');
my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2);
my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------ 
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R20080801-1.seq.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091210/0431bad7/attachment-0003.txt>

From SMarkel at accelrys.com  Thu Dec 10 09:47:36 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 10 Dec 2009 06:47:36 -0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net>

Reginald,

I didn't see anything highlighted in red but the three strings in the
pairwise alignment display can be obtained from an HSP using

    $hsp->query_string()
    $hsp->hit_string()
    $hsp->homology_string()

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh
Sent: Thursday, 10 December 2009 6:22 AM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] about bioperl issue
Importance: High

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh


From David.Messina at sbc.su.se  Thu Dec 10 10:09:31 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:09:31 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>

Hi Reginald,

None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.

Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.


Dave


From David.Messina at sbc.su.se  Thu Dec 10 10:36:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:36:49 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>

Hi Reginald,

Please keep all replies on the list so that everyone can follow the thread.

In a separate email, Scott gave the answer you were looking for,  I think.

Namely:
   $hsp->query_string()
OR
   $hsp->hit_string()


Dave


On Dec 10, 2009, at 16:31, Hsueh wrote:

> Dear Dave Messina
> 
> I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
> 
> Thank you
> 
> Reginald Hsueh
> 
> ------------------------------------------------------------------------------------------------------------------------------
> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
>                  |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
> ------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> --------------------------------------------------
> From: "Dave Messina" <David.Messina at sbc.su.se>
> Sent: Thursday, December 10, 2009 11:09 PM
> To: "Hsueh" <lovebaby39 at gmail.com>
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] about bioperl issue
> 
>> Hi Reginald,
>> 
>> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.
>> 
>> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.
>> 
>> 
>> Dave


From lovebaby39 at gmail.com  Thu Dec 10 10:53:00 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 23:53:00 +0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <AEA3314B45B14452A4BD1E3A2235AA5D@SHAPC>

Dear Dave Messina

Thank you for your replies.

Reginald Hsueh

--------------------------------------------------
From: "Dave Messina" <David.Messina at sbc.su.se>
Sent: Thursday, December 10, 2009 11:36 PM
To: "Hsueh" <lovebaby39 at gmail.com>
Cc: <bioperl-l at bioperl.org>
Subject: Re: [Bioperl-l] about bioperl issue

> Hi Reginald,
>
> Please keep all replies on the list so that everyone can follow the 
> thread.
>
> In a separate email, Scott gave the answer you were looking for,  I think.
>
> Namely:
>   $hsp->query_string()
> OR
>   $hsp->hit_string()
>
>
>
> Dave
>
>
>
>
> On Dec 10, 2009, at 16:31, Hsueh wrote:
>
>> Dear Dave Messina
>>
>> I need to get the string that is 
>> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
>>
>> Thank you
>>
>> Reginald Hsueh
>>
>> ------------------------------------------------------------------------------------------------------------------------------
>> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>> 206
>>                  |||||| ||||||||||||||||||    |||| || |||||| 
>> |||||||||||| ||
>> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 
>> 173
>> ------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>> --------------------------------------------------
>> From: "Dave Messina" <David.Messina at sbc.su.se>
>> Sent: Thursday, December 10, 2009 11:09 PM
>> To: "Hsueh" <lovebaby39 at gmail.com>
>> Cc: <bioperl-l at bioperl.org>
>> Subject: Re: [Bioperl-l] about bioperl issue
>>
>>> Hi Reginald,
>>>
>>> None of the words in your email or the attachment are colored red ? 
>>> unfortunately any kind of formatting tends to get removed from emails 
>>> send to mailing lists.
>>>
>>> Could you be more specific about what part of the blast report you are 
>>> not able to get? You could even just copy and paste that particular bit 
>>> of the report into your reply if it's not clear what to call it.
>>>
>>>
>>> Dave


>>>>Dear
>>>>
>>>>The following is code.
>>>>
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>my at params_rb = ( 'program'  => 'blastn',
>>>>            'database' => 'DB\\RB_GUS\\RB_GUS');
>>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);
>>>>
>>>>my $input_rb = Bio::Seq->new(-id  =>"test_query",
>>>>                       -seq => $testline2);
>>>>my $blast_report_rb = $factory_rb->blastall($input_rb);
>>>>
>>>> while (my $result_rb =  $blast_report_rb-> next_result ) {
>>>>  while (my $hit_rb = $result_rb->next_hit()){
>>>>   while (my $hsp_rb = $hit_rb->next_hsp()){
>>>>    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " 
>>>> , $hsp_rb->score , "\n" ;
>>>>    #print " ",$hit->name,"\n";
>>>>   }
>>>>  }
>>>> }
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>
>>>>I know how to get "name", "evalue" and  "score", but I don't know how 
>>>>to get the word which is in red color. (or please see attachment.)
>>>>------------------------------------------------------------------------------------------------------------------
>>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>>>>206
>>>>                   |||||| ||||||||||||||||||    |||| || |||||| 
>>>> |||||||||||| ||
>>>>Sbjct: 114 
>>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
>>>>------------------------------------------------------------------------------------------------------------------
>>>>
>>>>I will appreciate if you could tell me how to do it.
>>>>Thank you.
>>>>
>>>>Reginald Hsueh 


From pg4 at sanger.ac.uk  Thu Dec 10 15:50:40 2009
From: pg4 at sanger.ac.uk (Pablo Marin-Garcia)
Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT)
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
References: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
Message-ID: <alpine.DEB.1.10.0912102042180.8440@deskpro17122.dynamic.sanger.ac.uk>


If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) 
please read this recent thread at ensembl-dev:

http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html

Seems that the ensembl gene mapping to NCBI is done through translation so 
the noncoding genes do not have the corresponding NCBI gene mapped.


   -Pablo


> ------------------------------
>
> Message: 4
> Date: Wed, 9 Dec 2009 19:59:37 -0600
> From: Chris Fields <cjfields at illinois.edu>
> Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi
> To: Tom Keller <kellert at ohsu.edu>
> Cc: BioPerl-List <bioperl-l at bioperl.org>
> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu>
> Content-Type: text/plain; charset=us-ascii
>
> Tom,
>
> Probably best to do this via BioMart:
>
> http://www.ensembl.org/biomart/
>
> I would assume you can also do this via the ensembl perl API as well.
>
> Also, have a look at the UniProt ID Mapper:
>
> http://www.uniprot.org/?tab=mapping
>
> chris
>
> On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:
>
>> Greetings,
>> Is there a simple way to map a list of ensembl ids to the NCBI gis?
>>
>> thanks,
>> Tom
>>
>> Thomas (Tom) Keller
>> kellert at ohsu.edu
>> 503.494.2442
>> 6339b R Jones Hall (BSc/CROET)
>> www.ohsu.edu/xd/research/research-cores/dna-analysis/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>

====================================================================
                      Pablo Marin-Garcia, PhD

                     \\//          (Argiope bruennichi
                \/\/`(||>O:'\/\/   with stabilimentum)
                     //\\

Sanger Institute                |  PostDoc / Computer Biologist
Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH     |  room : N333
United Kingdom                  |  email: pablo.marin at sanger.ac.uk
====================================================================


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From umjsm at leeds.ac.uk  Fri Dec 11 11:44:42 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Fri, 11 Dec 2009 16:44:42 +0000
Subject: [Bioperl-l] extract and write a pdb chain
Message-ID: <1260549882.6484.11.camel@limm-pc1254>

Hello,

I am trying to do a very easy think but I don't get it. I want to write
in a file a chain of a pdb. I have try a lot of thinks but what I think
that it should work is the next script:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;

my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $chain ($struc->get_chains) {
	if($chain->id eq "A"){
		$new_entry->chain($chain);
		last;
	}
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');#
$out->write_structure($new_entry);

it doesn't. I get the next error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: add_chain: first argument needs to be a Model object ()

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335
STACK:
Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304
STACK: read_pdb.pl:10
-----------------------------------------------------------

As far I understand the documentation, the method chain of the object
Bio::Structure::Entry requires an as input an object of type Chain.

Any solution will be very welcome.

best regards,
Joan


From wkretzsch at gmail.com  Fri Dec 11 14:22:31 2009
From: wkretzsch at gmail.com (Warren W. Kretzschmar)
Date: Fri, 11 Dec 2009 14:22:31 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files
	generated by Hudson's ms
Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>

Hi,
I'm new to the bioperl community.  I've created a perl module that
reads in msOUT files generated by Hudson's ms.  As far as I
understand, there is no SeqIO module to read and output these files?
If so, I propose to create a module that does this.  Any suggestions?

Thanks,
Warren Kretzschmar


From maj at fortinbras.us  Fri Dec 11 14:59:53 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 11 Dec 2009 14:59:53 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT
	filesgenerated by Hudson's ms
In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife>

Hi Warren,
I say go for it. You'll want to have a look at
http://bio.perl.org/wiki/Advanced_BioPerl
which explains most of our tips and "policies" for prospective
code contributors, as well as
http://bio.perl.org/wiki/HOWTO:SeqIO
which details SeqIO from the user's perspective. Look
carefully at some Bio::SeqIO::* modules for implementation
details. If you have code to propose, use
http://bugzilla.bioperl.org
and enter a new enhancement, where you can upload
your module for us to review.
MAJ
----- Original Message ----- 
From: "Warren W. Kretzschmar" <wkretzsch at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 11, 2009 2:22 PM
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by 
Hudson's ms


> Hi,
> I'm new to the bioperl community.  I've created a perl module that
> reads in msOUT files generated by Hudson's ms.  As far as I
> understand, there is no SeqIO module to read and output these files?
> If so, I propose to create a module that does this.  Any suggestions?
>
> Thanks,
> Warren Kretzschmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bosborne11 at verizon.net  Fri Dec 11 15:37:45 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 11 Dec 2009 15:37:45 -0500
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260549882.6484.11.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
Message-ID: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>

Joan,

It looks to me like the first argument to the add_chain() method has  
to be a Model object, the second is the Chain itself. See Structure/ 
Entry.pm, for example. However if you're seeing some documentation  
that says something else then tell us where, it needs to be corrected.

In Bio::Structure an Entry consists of one or Models, each of which  
has one or more Chains. This allows you to build macromolecular  
complexes (an Entry), which could have more than one defined proteins  
or protein complexes (Models).

Brian O.

On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:

> Hello,
>
> I am trying to do a very easy think but I don't get it. I want to  
> write
> in a file a chain of a pdb. I have try a lot of thinks but what I  
> think
> that it should work is the next script:
>
> use Bio::Structure::IO;
> use strict;
>
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> =>
> 'pdb');
> my $struc = $structio->next_structure;
>
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
>
> for my $chain ($struc->get_chains) {
> 	if($chain->id eq "A"){
> 		$new_entry->chain($chain);
> 		last;
> 	}
> }
>
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');#
> $out->write_structure($new_entry);
>
> it doesn't. I get the next error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: add_chain: first argument needs to be a Model object ()
>
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> 368
> STACK:
> Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:335
> STACK:
> Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:391
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:304
> STACK: read_pdb.pl:10
> -----------------------------------------------------------
>
> As far I understand the documentation, the method chain of the object
> Bio::Structure::Entry requires an as input an object of type Chain.
>
> Any solution will be very welcome.
>
> best regards,
> Joan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Sun Dec 13 16:48:13 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sun, 13 Dec 2009 21:48:13 +0000
Subject: [Bioperl-l] combining tree image with heatmap
Message-ID: <4B25611D.6050009@sgul.ac.uk>

I am trying to draw a tree on the side of a heatmap image, much like you
see after clustering data.

I was wondering if anyone has managed to do this using bioperl? I can
draw the two separately, but can't quite seem to work out how to put the
two together and get the nodes to line up with the correct row of
clustering data.

Is there any particular module to look at?

thanks for any help

adam


From dhwani1030 at gmail.com  Sat Dec 12 15:04:01 2009
From: dhwani1030 at gmail.com (dhwani gandhi)
Date: Sat, 12 Dec 2009 15:04:01 -0500
Subject: [Bioperl-l] Bioperl code help
Message-ID: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>

Hi,
I am very new to Bioperl but I am somewhat familiar to perl though.

I write my perl programs in Notepad++ and run them in cmd.

Now, I want to run Bioperl programs. I just installed bioperl on my
computer. And I have a program using bioperl modules in Notepad++.

My question is how to run these programs? Can they be ran in cmd as well? or
do I use ppm?

Please help.

Thanks,
-Dhwani Gandhi.


From eric_donaldson at med.unc.edu  Sun Dec 13 18:15:24 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Sun, 13 Dec 2009 18:15:24 -0500
Subject: [Bioperl-l] problem with install
Message-ID: <f77787b07d66b.4b252f3c@med.unc.edu>

Hello,

Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the 

fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10.

But now I get an error when trying to run a bioperl script.

Here is the error:

Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8.
BEGIN failed--compilation aborted at blastparser.pl line 8.


I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me?

Thank you,

Eric


Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From jason at bioperl.org  Sun Dec 13 20:24:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 17:24:26 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f77787b07d66b.4b252f3c@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>

Hi Eric -

Bio::Tools::BPlite is no longer supported in Bioperl - it was  
deprecated several releases ago.
It was replaced with Bio::SearchIO

-jason
On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:

> Hello,
>
> Today I downloaded bioperl 1.61 on my new macbook pro using fink.  I  
> used the
>
> fink install bioperl.pm-588 as I could not get it to instal using  
> the perl version 5.10.
>
> But now I get an error when trying to run a bioperl script.
>
> Here is the error:
>
> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ 
> perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.pl line 8.
> BEGIN failed--compilation aborted at blastparser.pl line 8.
>
>
> I am a novice at unix and bioperl so I do not know how to  
> troubleshoot this, would you please hleo me?
>
> Thank you,
>
> Eric
>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Sun Dec 13 23:09:45 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 20:09:45 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f79059397d7fa.4b255f0b@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>

So you installed perl-5.10 or using system perl?  I'm confused if you  
actually installed bioperl.pm or not via fink?

It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5  
which is one of the dirs it would have installed in, but I don't think  
you actually installed bioperl.

you can try and do:
$ locate Bio/SearchIO.pm

We'll see if any of the other osx/fink gurus are on the list that can  
help or you can install it via CPAN I guess.

-jason
On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:

>
> I actually tried a different blastparser that uses BIO::SearchIO and  
> got the same message:
>
> Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ 
> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.new.pl line 8.
> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>
> I suspect there is a path problem, but am not savvy enough to know  
> how to fix it.  I am really just a hacker.... I have several scripts  
> that I use regularly and that I know how to modify, but am lost when  
> they don't work...
>
> Thanks for any help,
>
> Eric
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 8:24 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: bioperl-l at bioperl.org
>
>> Hi Eric -
>>
>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>> was
>> deprecated several releases ago.
>> It was replaced with Bio::SearchIO
>>
>> -jason
>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>
>>> Hello,
>>>
>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>> fink.  I
>>> used the
>>>
>>> fink install bioperl.pm-588 as I could not get it to instal
>> using
>>> the perl version 5.10.
>>>
>>> But now I get an error when trying to run a bioperl script.
>>>
>>> Here is the error:
>>>
>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>> /sw/lib/
>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>> /sw/lib/perl5/darwin /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>
>>>
>>> I am a novice at unix and bioperl so I do not know how
>> to
>>> troubleshoot this, would you please hleo me?
>>>
>>> Thank you,
>>>
>>> Eric
>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>>
>> < 
>> eric_donaldson.vcf>_______________________________________________>  
>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Mon Dec 14 00:10:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 21:10:54 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f7a30bbc786b3.4b258092@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>

Eric -
please CC the bioperl list when responding so others can help - I  
can't be the only answerer.

But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you  
would need to make sure that is added to your PERL5LIB.
There are some help docs on the perl sites I expect on how to get your  
PATHs in order.

Or you can just install via CPAN which will put it in the right path -  
there are docs on the bioperl website about installing via CPAN.

-jason
On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:

> Hi Jason,
>
> The fink package did not have support for perl 5.10, so I attempted  
> to install the perl 5.8.6 package.
>
> When I attempted: locate Bio/SearchIO.pm
> I got: -bash: $: command not found
>
> So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ 
> SearchIO.pm  I cannot access it.  Do I need to use the older version  
> of perl?
>
> Would it be better to install with CPAN?  If so, can you send me to  
> a page that has instructions?
>
> Thank you so much!
>
> ERic
>
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 11:10 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: BioPerl List <bioperl-l at bioperl.org>
>
>> So you installed perl-5.10 or using system perl?  I'm
>> confused if you
>> actually installed bioperl.pm or not via fink?
>>
>> It seems like since your @INC or $PERL5LIB points to
>> /sw/lib/perl5
>> which is one of the dirs it would have installed in, but I don't
>> think
>> you actually installed bioperl.
>>
>> you can try and do:
>> $ locate Bio/SearchIO.pm
>>
>> We'll see if any of the other osx/fink gurus are on the list
>> that can
>> help or you can install it via CPAN I guess.
>>
>> -jason
>> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
>>
>>>
>>> I actually tried a different blastparser that uses
>> BIO::SearchIO and
>>> got the same message:
>>>
>>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
>> /sw/lib/perl5/
>>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
>> /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.new.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>>>
>>> I suspect there is a path problem, but am not savvy enough to
>> know
>>> how to fix it.  I am really just a hacker.... I have
>> several scripts
>>> that I use regularly and that I know how to modify, but am
>> lost when
>>> they don't work...
>>>
>>> Thanks for any help,
>>>
>>> Eric
>>>
>>> ----- Original Message -----
>>> From: Jason Stajich <jason at bioperl.org>
>>> Date: Sunday, December 13, 2009 8:24 pm
>>> Subject: Re: [Bioperl-l] problem with install
>>> To: eric_donaldson at med.unc.edu
>>> Cc: bioperl-l at bioperl.org
>>>
>>>> Hi Eric -
>>>>
>>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>>>> was
>>>> deprecated several releases ago.
>>>> It was replaced with Bio::SearchIO
>>>>
>>>> -jason
>>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>>>> fink.  I
>>>>> used the
>>>>>
>>>>> fink install bioperl.pm-588 as I could not get it to instal
>>>> using
>>>>> the perl version 5.10.
>>>>>
>>>>> But now I get an error when trying to run a bioperl script.
>>>>>
>>>>> Here is the error:
>>>>>
>>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>>>> /sw/lib/
>>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>>>> /sw/lib/perl5/darwin /
>>>>> Library/Perl/Updates/5.10.0
>> /System/Library/Perl/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/5.10.0
>>>> /Library/Perl/5.10.0/
>>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>>>> /Network/Library/
>>>>> Perl/5.10.0/darwin-thread-multi-2level
>>>> /Network/Library/Perl/5.10.0 /
>>>>> Network/Library/Perl
>> /System/Library/Perl/Extras/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>>>> at
>>>>> blastparser.pl line 8.
>>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>>>
>>>>>
>>>>> I am a novice at unix and bioperl so I do not know how
>>>> to
>>>>> troubleshoot this, would you please hleo me?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>> Eric F. Donaldson, Ph.D.
>>>>> Research Assistant Professor, Ralph Baric Lab
>>>>> University of North Carolina
>>>>> Department of Epidemiology
>>>>>
>>>>>
>>>>>
>>>> <
>>>>
>> eric_donaldson.vcf>_______________________________________________>
>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>>
>>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>> <eric_donaldson.vcf>
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From awitney at sgul.ac.uk  Mon Dec 14 04:36:19 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 14 Dec 2009 09:36:19 +0000
Subject: [Bioperl-l] Bioperl code help
In-Reply-To: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
References: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
Message-ID: <4B260713.3070402@sgul.ac.uk>


bioperl programs are just perl programs so you should run them in
exactly the same way as your perl prorgrams, from the command line

HTH

adam

On 12/12/2009 20:04, dhwani gandhi wrote:
> Hi,
> I am very new to Bioperl but I am somewhat familiar to perl though.
> 
> I write my perl programs in Notepad++ and run them in cmd.
> 
> Now, I want to run Bioperl programs. I just installed bioperl on my
> computer. And I have a program using bioperl modules in Notepad++.
> 
> My question is how to run these programs? Can they be ran in cmd as well? or
> do I use ppm?
> 
> Please help.
> 
> Thanks,
> -Dhwani Gandhi.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From umjsm at leeds.ac.uk  Mon Dec 14 05:39:32 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 10:39:32 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
Message-ID: <1260787172.7359.0.camel@limm-pc1254>

Hi Brian,

I am not calling the method add_chain, I am calling the method chain

http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6

and if I don't use as an argument an object of type

Bio::Structure::Chain

I get an error like this (-->depends of the argument<--)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
we want a Bio::Structure::Chain or a list of these

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
STACK: read_pdb.pl:11
-----------------------------------------------------------


And if I use a Chain object I get the error that I told you.

I have try this code:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
my $model = Bio::Structure::Model->new( -id  => '0');

for my $chain ($struc->get_chains) {
        if($chain->id eq "A"){
                $new_entry->add_chain($model,$chain);

                last;
        }
}
$new_entry->add_model($model);
my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_entry);


But I get an empty pdb

HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
stru              
REMARK
1                                                                      
TER       1          A
0                                                      
MASTER                                                                          
END  

I am trying a lot of combinations, but I can't write a single chain into
a file. I don't know what I am doing wrong.

Thanks for helping

regards,
Joan


On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> Joan,
> 
> It looks to me like the first argument to the add_chain() method has  
> to be a Model object, the second is the Chain itself. See Structure/ 
> Entry.pm, for example. However if you're seeing some documentation  
> that says something else then tell us where, it needs to be corrected.
> 
> In Bio::Structure an Entry consists of one or Models, each of which  
> has one or more Chains. This allows you to build macromolecular  
> complexes (an Entry), which could have more than one defined proteins  
> or protein complexes (Models).
> 
> Brian O.
> 
> On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> 
> > Hello,
> >
> > I am trying to do a very easy think but I don't get it. I want to  
> > write
> > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > think
> > that it should work is the next script:
> >
> > use Bio::Structure::IO;
> > use strict;
> >
> > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > =>
> > 'pdb');
> > my $struc = $structio->next_structure;
> >
> > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> >
> > for my $chain ($struc->get_chains) {
> > 	if($chain->id eq "A"){
> > 		$new_entry->chain($chain);
> > 		last;
> > 	}
> > }
> >
> > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > 'pdb');#
> > $out->write_structure($new_entry);
> >
> > it doesn't. I get the next error:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: add_chain: first argument needs to be a Model object ()
> >
> > STACK: Error::throw
> > STACK:
> > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > 368
> > STACK:
> > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:335
> > STACK:
> > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:391
> > STACK:
> > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:304
> > STACK: read_pdb.pl:10
> > -----------------------------------------------------------
> >
> > As far I understand the documentation, the method chain of the object
> > Bio::Structure::Entry requires an as input an object of type Chain.
> >
> > Any solution will be very welcome.
> >
> > best regards,
> > Joan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From fs5 at sanger.ac.uk  Mon Dec 14 07:18:17 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 14 Dec 2009 12:18:17 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi,

Maybe I'm really missing something here but I can't find how to parse a
file that is basically just the Feature Table from an EMBL file, looking
like this:

FT   CDS
join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842)
FT                   /colour=7
FT                   /product="RNA-binding protein, putative"
FT   CDS             213199..214812
FT                   /colour=7
FT                   /product="eukaryotic translation initiation factor
3
FT                   subunit 7, putative"
...[more of the same]

So the file has no header and no actual sequence and it is used simply
to annotate a chromosome in a genome assembly. I've always used GFF for
that purpose but have been given this file now.
BioSeqIO->new(-format=>"EMBL") complains about the missing header and if
I stick in a fake ID line, it warns about the missing sequence and the
fact that the features don't fit on the sequence (of length 0). 
Of course it's not difficult to write my own parser but I'm sure there
must be a BioPerl way of doing that that I have just overlooked. Thanks
for your help.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Mon Dec 14 09:06:54 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Dec 2009 15:06:54 +0100
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>

Hi Frank,

You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12

Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.

It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.


Dave


PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


From eric_donaldson at med.unc.edu  Mon Dec 14 09:22:40 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Mon, 14 Dec 2009 09:22:40 -0500
Subject: [Bioperl-l] problem with install
In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
	<7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
Message-ID: <f750f0a17830d.4b2603e0@med.unc.edu>

Thank you Jason.? I appreciate the help.

Eric

----- Original Message -----
From: Jason Stajich <jason at bioperl.org>
Date: Monday, December 14, 2009 12:10 am
Subject: Re: [Bioperl-l] problem with install
To: eric_donaldson at med.unc.edu
Cc: BioPerl List <bioperl-l at bioperl.org>

> Eric -
> please CC the bioperl list when responding so others can help - 
> I? 
> can't be the only answerer.
> 
> But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ 
> you? 
> would need to make sure that is added to your PERL5LIB.
> There are some help docs on the perl sites I expect on how to 
> get your? 
> PATHs in order.
> 
> Or you can just install via CPAN which will put it in the right 
> path -? 
> there are docs on the bioperl website about installing via CPAN.
> 
> -jason
> On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:
> 
> > Hi Jason,
> >
> > The fink package did not have support for perl 5.10, so I 
> attempted? 
> > to install the perl 5.8.6 package.
> >
> > When I attempted: locate Bio/SearchIO.pm
> > I got: -bash: $: command not found
> >
> > So even though I can find SearchIO.pm in 
> sw/lib/perl5/5.8.8/Bio/ 
> > SearchIO.pm? I cannot access it.? Do I need to use 
> the older version? 
> > of perl?
> >
> > Would it be better to install with CPAN?? If so, can you 
> send me to? 
> > a page that has instructions?
> >
> > Thank you so much!
> >
> > ERic
> >
> >
> > ----- Original Message -----
> > From: Jason Stajich <jason at bioperl.org>
> > Date: Sunday, December 13, 2009 11:10 pm
> > Subject: Re: [Bioperl-l] problem with install
> > To: eric_donaldson at med.unc.edu
> > Cc: BioPerl List <bioperl-l at bioperl.org>
> >
> >> So you installed perl-5.10 or using system perl?? I'm
> >> confused if you
> >> actually installed bioperl.pm or not via fink?
> >>
> >> It seems like since your @INC or $PERL5LIB points to
> >> /sw/lib/perl5
> >> which is one of the dirs it would have installed in, but I don't
> >> think
> >> you actually installed bioperl.
> >>
> >> you can try and do:
> >> $ locate Bio/SearchIO.pm
> >>
> >> We'll see if any of the other osx/fink gurus are on the list
> >> that can
> >> help or you can install it via CPAN I guess.
> >>
> >> -jason
> >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
> >>
> >>>
> >>> I actually tried a different blastparser that uses
> >> BIO::SearchIO and
> >>> got the same message:
> >>>
> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
> >> /sw/lib/perl5/
> >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
> >> /
> >>> Library/Perl/Updates/5.10.0 
> /System/Library/Perl/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/5.10.0
> >> /Library/Perl/5.10.0/
> >>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >> /Network/Library/
> >>> Perl/5.10.0/darwin-thread-multi-2level
> >> /Network/Library/Perl/5.10.0 /
> >>> Network/Library/Perl 
> /System/Library/Perl/Extras/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >> at
> >>> blastparser.new.pl line 8.
> >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
> >>>
> >>> I suspect there is a path problem, but am not savvy enough to
> >> know
> >>> how to fix it.? I am really just a hacker.... I have
> >> several scripts
> >>> that I use regularly and that I know how to modify, but am
> >> lost when
> >>> they don't work...
> >>>
> >>> Thanks for any help,
> >>>
> >>> Eric
> >>>
> >>> ----- Original Message -----
> >>> From: Jason Stajich <jason at bioperl.org>
> >>> Date: Sunday, December 13, 2009 8:24 pm
> >>> Subject: Re: [Bioperl-l] problem with install
> >>> To: eric_donaldson at med.unc.edu
> >>> Cc: bioperl-l at bioperl.org
> >>>
> >>>> Hi Eric -
> >>>>
> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
> >>>> was
> >>>> deprecated several releases ago.
> >>>> It was replaced with Bio::SearchIO
> >>>>
> >>>> -jason
> >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
> >>>> fink.? I
> >>>>> used the
> >>>>>
> >>>>> fink install bioperl.pm-588 as I could not get it to instal
> >>>> using
> >>>>> the perl version 5.10.
> >>>>>
> >>>>> But now I get an error when trying to run a bioperl script.
> >>>>>
> >>>>> Here is the error:
> >>>>>
> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
> >>>> /sw/lib/
> >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
> >>>> /sw/lib/perl5/darwin /
> >>>>> Library/Perl/Updates/5.10.0
> >> /System/Library/Perl/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/5.10.0
> >>>> /Library/Perl/5.10.0/
> >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >>>> /Network/Library/
> >>>>> Perl/5.10.0/darwin-thread-multi-2level
> >>>> /Network/Library/Perl/5.10.0 /
> >>>>> Network/Library/Perl
> >> /System/Library/Perl/Extras/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >>>> at
> >>>>> blastparser.pl line 8.
> >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
> >>>>>
> >>>>>
> >>>>> I am a novice at unix and bioperl so I do not know how
> >>>> to
> >>>>> troubleshoot this, would you please hleo me?
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>>
> >>>>> Eric F. Donaldson, Ph.D.
> >>>>> Research Assistant Professor, Ralph Baric Lab
> >>>>> University of North Carolina
> >>>>> Department of Epidemiology
> >>>>>
> >>>>>
> >>>>>
> >>>> <
> >>>>
> >> eric_donaldson.vcf>_______________________________________________>
> >>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> jason.stajich at gmail.com
> >>>> jason at bioperl.org
> >>>>
> >>>>
> >>>
> >>> Eric F. Donaldson, Ph.D.
> >>> Research Assistant Professor, Ralph Baric Lab
> >>> University of North Carolina
> >>> Department of Epidemiology
> >>>
> >>>
> >>> <eric_donaldson.vcf>
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >>
> >>
> >
> > Eric F. Donaldson, Ph.D.
> > Research Assistant Professor, Ralph Baric Lab
> > University of North Carolina
> > Department of Epidemiology
> >
> >
> > <eric_donaldson.vcf>
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> 
> 

Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From umjsm at leeds.ac.uk  Mon Dec 14 11:58:03 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 16:58:03 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260787172.7359.0.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
	<1260787172.7359.0.camel@limm-pc1254>
Message-ID: <1260809883.7359.15.camel@limm-pc1254>

Hi again,


To extract a pdb chain in a file, I have had to do it adding atom by
atom to a new structure.

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_struct = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $model ($struc->get_models){
	$new_struct->add_model($model);
	for my $chain ($struc->get_chains) {
		$new_struct->add_chain($model,$chain);
		if($chain->id eq "A"){
			foreach my $res ($struc->get_residues($chain)){
				$new_struct->add_residue($chain,$res);
				foreach my $atom  ($struc->get_atoms($res)){
					$new_struct->add_atom($res,$atom);
				}
			}
		}
		last;
	}
	last;
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_struct);

I suppose that there should be a more elegant way to do it.

If someone knows it and can explain it I will be very grateful.

kind regards, 
Joan

On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote:
> Hi Brian,
> 
> I am not calling the method add_chain, I am calling the method chain
> 
> http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6
> 
> and if I don't use as an argument an object of type
> 
> Bio::Structure::Chain
> 
> I get an error like this (-->depends of the argument<--)
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
> we want a Bio::Structure::Chain or a list of these
> 
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
> STACK: read_pdb.pl:11
> -----------------------------------------------------------
> 
> 
> And if I use a Chain object I get the error that I told you.
> 
> I have try this code:
> 
> use Bio::Structure::IO;
> use strict;
> 
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
> 'pdb');
> my $struc = $structio->next_structure;
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> my $model = Bio::Structure::Model->new( -id  => '0');
> 
> for my $chain ($struc->get_chains) {
>         if($chain->id eq "A"){
>                 $new_entry->add_chain($model,$chain);
> 
>                 last;
>         }
> }
> $new_entry->add_model($model);
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');
> $out->write_structure($new_entry);
> 
> 
> But I get an empty pdb
> 
> HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
> stru              
> REMARK
> 1                                                                      
> TER       1          A
> 0                                                      
> MASTER                                                                          
> END  
> 
> I am trying a lot of combinations, but I can't write a single chain into
> a file. I don't know what I am doing wrong.
> 
> Thanks for helping
> 
> regards,
> Joan
> 
> 
> On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> > Joan,
> > 
> > It looks to me like the first argument to the add_chain() method has  
> > to be a Model object, the second is the Chain itself. See Structure/ 
> > Entry.pm, for example. However if you're seeing some documentation  
> > that says something else then tell us where, it needs to be corrected.
> > 
> > In Bio::Structure an Entry consists of one or Models, each of which  
> > has one or more Chains. This allows you to build macromolecular  
> > complexes (an Entry), which could have more than one defined proteins  
> > or protein complexes (Models).
> > 
> > Brian O.
> > 
> > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> > 
> > > Hello,
> > >
> > > I am trying to do a very easy think but I don't get it. I want to  
> > > write
> > > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > > think
> > > that it should work is the next script:
> > >
> > > use Bio::Structure::IO;
> > > use strict;
> > >
> > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > > =>
> > > 'pdb');
> > > my $struc = $structio->next_structure;
> > >
> > > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> > >
> > > for my $chain ($struc->get_chains) {
> > > 	if($chain->id eq "A"){
> > > 		$new_entry->chain($chain);
> > > 		last;
> > > 	}
> > > }
> > >
> > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > > 'pdb');#
> > > $out->write_structure($new_entry);
> > >
> > > it doesn't. I get the next error:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: add_chain: first argument needs to be a Model object ()
> > >
> > > STACK: Error::throw
> > > STACK:
> > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > > 368
> > > STACK:
> > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:335
> > > STACK:
> > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:391
> > > STACK:
> > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:304
> > > STACK: read_pdb.pl:10
> > > -----------------------------------------------------------
> > >
> > > As far I understand the documentation, the method chain of the object
> > > Bio::Structure::Entry requires an as input an object of type Chain.
> > >
> > > Any solution will be very welcome.
> > >
> > > best regards,
> > > Joan
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gowthaman.ramasamy at sbri.org  Mon Dec 14 14:16:32 2009
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 14 Dec 2009 11:16:32 -0800
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
Message-ID: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>


Hi All,
I have a list of GO terms. And would like to pull GO accessions for them.
I can easily do the revere of it using get_term("GO::00000051").

But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".


Thanks very much,
Gowtham


From lsbrath at gmail.com  Mon Dec 14 14:41:39 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Mon, 14 Dec 2009 14:41:39 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>

Hello,

I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
following error message:

Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
/sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
/Library/Perl/5.8.8 /Library/Perl
/Network/Library/Perl/5.8.8/darwin-thread-multi-2level
/Network/Library/Perl/5.8.8 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
at project_example.pl line 4.
BEGIN failed--compilation aborted at project_example.pl line 4.

I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
Any ideas?

MEB


From scott at scottcain.net  Mon Dec 14 14:47:05 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 14 Dec 2009 14:47:05 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com>

Hi Mgavi,

I think Jason may have already started helping, but the question is:
is SeqIO.pm anywhere in those directories?  If not, why not?  If so,
why can't the perl you are using find it?  Do you have more than one
instance of perl on your machine (fairly likely if you are using a
fink-installed BioPerl)?  When you execute your script, which perl are
you using?

Scott


On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite <lsbrath at gmail.com> wrote:
> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From bosborne11 at verizon.net  Mon Dec 14 14:45:35 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 14 Dec 2009 14:45:35 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net>

Mgavi,

So there's a directory called /sw/lib/perl5/Bio? Or is it called  
something else?

Brian O.


On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote:

> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get  
> the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ 
> 5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error  
> message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec 14 16:42:09 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 Dec 2009 13:42:09 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <C56E1117A61A4835B8E794D34A157F5B@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>
	<C56E1117A61A4835B8E794D34A157F5B@jonas>
Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org>

you can read the man page from sean Eddy or use it exactly as I showed  
you
sreformat fasta filename > filename.new

you can also use the 1st example which is a bioperl solution.

-jason
On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote:

> Hi Jason,
> thank you very much for your answer.
> i am sorry to bother u again but i'm afraid i need some help with  
> that because i don't see how to use sreformat?
> i dont get it managed to write a script that works.
>
> thank u again :)
> jonas
>
>
> ----- Original Message ----- From: "Jason Stajich" <jason at bioperl.org>
> To: "Jonas Schaer" <Jonas_Schaer at gmx.de>
> Cc: <bioperl-l at bioperl.org>
> Sent: Tuesday, December 08, 2009 6:44 PM
> Subject: Re: [Bioperl-l] fasta format
>
>
>> you can run
>> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or
>> that is installed when you install the Bioperl scripts)
>> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o  
>> yournewfile.fa
>> # rename it back
>> $ mv yournewfile.fa yourfile.fa
>>
>> or
>> $ sreformat fasta yourfile.fa > yournewfile.fa
>> $ mv yournewfile.fa yourfile.fa
>>
>>
>> -jason
>> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:
>>
>>> Hi there,
>>> I have a little question concerning bioperl. I have
>>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read
>>> in some fasta files. first it worked fine, but now i have some
>>> fastafiles in slightly different format (not all lines have the same
>>> length!).
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Each line of the fasta entry must be the same length except the
>>> last.
>>>   Line above #49 '
>>> ..' is 28 != 101 chars.
>>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/
>>> Fasta.pm:771
>>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: 
>>> 681
>>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
>>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
>>> STACK main::readfasta blast_eval.pm:174
>>> STACK toplevel blast_eval.pm:83
>>> -------------------------------------
>>>
>>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/
>>> site/lib/Bio/
>>> DB/Fasta.pm line 1054.
>>>
>>>
>>> Is there any way to use these fasta files with diffrent length of
>>> lines with this fasta.pm module or will i have to change the format
>>> of my fasta-files(big databases...) ?
>>>
>>> Thanks in advance for any help!
>>>
>>> Regards, Jonas
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>
>
> --------------------------------------------------------------------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date:  
> 12/08/09 07:34:00
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Mon Dec 14 20:23:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 14 Dec 2009 19:23:05 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>

All,

The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:

1) Stockholm Rfam reverses start and end if the strand == -1
          
   chrY/598-1

2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end

   rice-3(+)/16598648-16600199

The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?

chris


From bernd.web at gmail.com  Tue Dec 15 03:37:44 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 15 Dec 2009 09:37:44 +0100
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
	<C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com>

Dear Gowthaman,

A non-BioPerl solution: the Ontology Lookup service at EBI. It also
provides a web service interface.

http://www.ebi.ac.uk/ontology-lookup/

citrulline metabolic process has to be selected from the pull-down
list in the interactive page. This will return the ID (GO:0000052) and
addional info:

definition	The chemical reactions and pathways involving citrulline,
N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins.
preferred name	citrulline metabolic process
exact synonym	citrulline metabolism
subset	Prokaryotic GO subset
xref_definition	ISBN:209853"Oxford Dictionary of Biochemistry and
Molecular Biology"

The webservice is described at
http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do


Regards,
Bernd


On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy
<gowthaman.ramasamy at sbri.org> wrote:
>
> Hi All,
> I have a list of GO terms. And would like to pull GO accessions for them.
> I can easily do the revere of it using get_term("GO::00000051").
>
> But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".
>
>
> Thanks very much,
> Gowtham
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From fs5 at sanger.ac.uk  Tue Dec 15 05:38:40 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 15 Dec 2009 10:38:40 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
	<0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk>

Thanks Dave,
good to know that I haven't overlooked something bleedingly obvious in
Bioperl that already does this :-)
No problem, I have already implemented a simple parser to do it, which
works fine for my files. 
Thanks
Frank


On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote:
> Hi Frank,
> 
> You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12
> 
> Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.
> 
> It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.
> 
> 
> Dave
> 
> 
> PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:
> 
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From rmb32 at cornell.edu  Tue Dec 15 10:09:43 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 15 Dec 2009 07:09:43 -0800
Subject: [Bioperl-l] AGI's fpc stuff:  Bio::Map::Physical, Bio::MapIO::fpc,
	etc
Message-ID: <4B27A6B7.6090709@cornell.edu>

Hi all,

Recently I caught an interesting thing related to making GFF files out
of FPC maps built recently using Bio::MapIO;:fpc.  All of the 
coordinates in the resulting GFF3 and the sizes of the contigs and 
clones seem to be dilated by 4x from where they should be.

This didn't happen with some earlier FPC datasets I ran through these 
modules.

I haven't gone through any of this very thoroughly, but I notice in 
Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my 
$basepair = 4096', and the routine goes on to use $basepair as a sort of 
multiplier for converting the native physical map units into basepairs 
for GFF-style output.

This makes me wonder if the newer FPC datasets coming out require a 
different $basepairs value, maybe 1024?  Are the original authors of 
these modules still around on this list?

Rob

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From tristan.lefebure at gmail.com  Tue Dec 15 12:18:26 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 15 Dec 2009 12:18:26 -0500
Subject: [Bioperl-l] ncurses and bioperl?
Message-ID: <200912151218.26357.tristan.lefebure@gmail.com>

Hello,

(Be careful: the following is a very naive question)

Something that I find myself missing is a simple way to look 
at alignments and trees on remote machines where I don't 
have access to X. Since, 
	(1) one can make wonderful terminal programs like screen 
and emacs by using ncurses, 
	(2) that alignment and tree objects are already well 
handled in bioperl, and 
	(3) that there is a CPAN Curses module; 

doing 1+2+3, may I dream of a curse/bioperl perl program to 
render alignment and trees? I suppose a plain C program 
would be much better, but well I am a biologist...

Thanks,

--Tristan


From jason at bioperl.org  Tue Dec 15 12:50:52 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 15 Dec 2009 09:50:52 -0800
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <AEFA51CB-0070-4A1F-9FE3-DA4810129398@bioperl.org>

not to say this isn't a good idea, but currently for curses I would  
use the treeviewing with retree from PHYLIP
and for short read alignments the samtools tview or Gambit (MarthLab)   
works great or something like ralee for viewing MSA alignments (though  
targeted for RNA editing)
  http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ 
  http://dx.doi.org/10.1093/bioinformatics/bth489

Just that there are prior examples so would be able to learn from them  
if you still wanted to roll your own here.

-jason
On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote:

> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From roy.chaudhuri at gmail.com  Tue Dec 15 12:47:26 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 15 Dec 2009 17:47:26 +0000
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <4B27CBAE.5000303@gmail.com>

Hi Tristan,

Not a Bioperl solution, but retree from the Phylip package displays 
trees in a terminal.

Roy.

On 15/12/2009 17:18, Tristan Lefebure wrote:
> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nml5566 at gmail.com  Tue Dec 15 16:37:30 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 15 Dec 2009 15:37:30 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>

Is the Bio::Ontology::OBOEngine module working or being currently
maintained? I tried following the documentation in the module:

* use Bio::Ontology::OBOEngine;

 my $parser = Bio::Ontology::OBOEngine->new
               ( -file => "gene_ontology.obo" );

 my $engine = $parser->parse();

*But, it throws an error when I run the file saying 'Can't locate object
method "parse" '. Does anyone have any experience getting this module
working; or, is there any alternative bioperl module to extract terms and
relationships out of sequence ontology files?


From hlapp at drycafe.net  Tue Dec 15 17:05:10 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 15 Dec 2009 17:05:10 -0500
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
Message-ID: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>

That shouldn't happen I suppose, but you're not supposed really to use  
the engine directly. Rather it will be used as a backing parser by the  
Bio::OntologyIO parser you choose. Have you tried that route and found  
it not to work?

	-hilmar

On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:

> Is the Bio::Ontology::OBOEngine module working or being currently
> maintained? I tried following the documentation in the module:
>
> * use Bio::Ontology::OBOEngine;
>
> my $parser = Bio::Ontology::OBOEngine->new
>               ( -file => "gene_ontology.obo" );
>
> my $engine = $parser->parse();
>
> *But, it throws an error when I run the file saying 'Can't locate  
> object
> method "parse" '. Does anyone have any experience getting this module
> working; or, is there any alternative bioperl module to extract  
> terms and
> relationships out of sequence ontology files?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Wed Dec 16 04:58:16 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Dec 2009 10:58:16 +0100
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <DB8FB8FF-7DCE-4718-9E17-856F09AE1F46@sbc.su.se>

I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.)

It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse.

Whichever way you go, I think

> a new method that creates this, and deprecate[s] out simple non-stranded NSE

would be great.


Dave


From maj at fortinbras.us  Wed Dec 16 07:51:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 16 Dec 2009 07:51:24 -0500
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>

I'm with Dave; option 1 is cleaner. The only problem might be the automatic 
interpretation of older output as always plus strand, but presumably these would 
have had to record the strandedness explicitly elsewhere, so they would be 
updatable. I'm definitely for making strandedness part of the spec in some way. 
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 14, 2009 8:23 PM
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes


> All,
>
> The current output for NSE format (Name/Start-End) via 
> Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have 
> seen two variations of NSE that incorporate strandedness:
>
> 1) Stockholm Rfam reverses start and end if the strand == -1
>
>   chrY/598-1
>
> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>
>   rice-3(+)/16598648-16600199
>
> The former breaks fewer things within BioPerl, but the latter seems more 
> explicit.  Any preferences?  Do we want a new method that creates this, and 
> deprecate out simple non-stranded NSE?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From tuco at pasteur.fr  Wed Dec 16 09:14:28 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 15:14:28 +0100
Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO
	(Genbank)
Message-ID: <4B28EB44.3080006@pasteur.fr>

Hi,

I've wrote a small Genbank parser few months ago before BioPerl release 
1.6.0.
I tried to use my code once again but now the output of my parser is empty.
It looks like Annotation from seqfeatures is not filled anymore.

Here is the code I used previously:

while(my $seq = $streamer->next_seq()){

     #We only want to retrieve CDS features...
     foreach my $feat (grep { $_->primary_tag() eq 'CDS' } 
$seq->get_SeqFeatures()){
         print $ofh join("#",
                         
$feat->annotation()->get_Annotations('locus_tag'),    # Acc num
                         $feat->annotation()->get_Annotations('gene')
                           ? 
$feat->annotation()->get_Annotations('gene')      # Gene name
                           : 
$feat->annotation()->get_Annotations('locus_tag'),
                         
$feat->annotation()->get_Annotations('product'),      # Description
                        ),"\n";
     }
}

$feat is a Bio::SeqFeature::Generic object

If I print Dumper($feat->annotation()) here is the output :

$VAR1 = bless( {
                  '_typemap' => bless( {
                                         '_type' => {
                                                      'comment' => 
'Bio::Annotation::Comment',
                                                      'reference' => 
'Bio::Annotation::Reference',
                                                      'dblink' => 
'Bio::Annotation::DBLink'
                                                    }
                                       }, 'Bio::Annotation::TypeManager' ),
                  '_annotation' => {}
                }, 'Bio::Annotation::Collection' );

Have some changes been made into the way annotation object is populated?

Thanks for any clue and sorry if my question look stupid

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From cjfields at illinois.edu  Wed Dec 16 10:09:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 16 Dec 2009 09:09:56 -0600
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <4B28EB44.3080006@pasteur.fr>
References: <4B28EB44.3080006@pasteur.fr>
Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>

Emmanuel,

The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):

for my $feat_object ($seq_object->get_SeqFeatures) {
    print "primary tag: ", $feat_object->primary_tag, "\n";
    for my $tag ($feat_object->get_all_tags) {
        print "  tag: ", $tag, "\n";
        for my $value ($feat_object->get_tag_values($tag)) {
            print "    value: ", $value, "\n";
        }   
    }
}

You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.

chris

On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:

> Hi,
> 
> I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0.
> I tried to use my code once again but now the output of my parser is empty.
> It looks like Annotation from seqfeatures is not filled anymore.
> 
> Here is the code I used previously:
> 
> while(my $seq = $streamer->next_seq()){
> 
>    #We only want to retrieve CDS features...
>    foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){
>        print $ofh join("#",
>                        $feat->annotation()->get_Annotations('locus_tag'),    # Acc num
>                        $feat->annotation()->get_Annotations('gene')
>                          ? $feat->annotation()->get_Annotations('gene')      # Gene name
>                          : $feat->annotation()->get_Annotations('locus_tag'),
>                        $feat->annotation()->get_Annotations('product'),      # Description
>                       ),"\n";
>    }
> }
> 
> $feat is a Bio::SeqFeature::Generic object
> 
> If I print Dumper($feat->annotation()) here is the output :
> 
> $VAR1 = bless( {
>                 '_typemap' => bless( {
>                                        '_type' => {
>                                                     'comment' => 'Bio::Annotation::Comment',
>                                                     'reference' => 'Bio::Annotation::Reference',
>                                                     'dblink' => 'Bio::Annotation::DBLink'
>                                                   }
>                                      }, 'Bio::Annotation::TypeManager' ),
>                 '_annotation' => {}
>               }, 'Bio::Annotation::Collection' );
> 
> Have some changes been made into the way annotation object is populated?
> 
> Thanks for any clue and sorry if my question look stupid
> 
> Regards
> 
> Emmanuel
> 
> -- 
> -------------------------
> Emmanuel Quevillon
> Biological Software and Databases Group
> Institut Pasteur
> +33 1 44 38 95 98
> tuco at_ pasteur dot fr
> -------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tuco at pasteur.fr  Wed Dec 16 10:37:45 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 16:37:45 +0100
Subject: [Bioperl-l] Data missing into Annotation object
 using	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <4B28FEC9.1080509@pasteur.fr>

On 12/16/2009 04:09 PM, Chris Fields wrote:
> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>      print "primary tag: ", $feat_object->primary_tag, "\n";
>      for my $tag ($feat_object->get_all_tags) {
>          print "  tag: ", $tag, "\n";
>          for my $value ($feat_object->get_tag_values($tag)) {
>              print "    value: ", $value, "\n";
>          }
>      }
> }
>
> You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
>    
Hi Chris

Thanks for the infos.
I indeed revert back to using $feat->get_tag_values() and it works as 
previously.
For my small problem I can keep this solution which far adapted for my 
problem.

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From sung at bio.cc  Wed Dec 16 12:55:16 2009
From: sung at bio.cc (Sungsam Gong)
Date: Wed, 16 Dec 2009 17:55:16 +0000
Subject: [Bioperl-l] pdb.pm and annotations
Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>

Hi,

Wanted to get pubmed identifier from a PDB file using Bio::Structure,
so hacked the code.
Knew that Bio::Structure::IO::pdb.pm get relevant info from either
'JRNL' or 'REMARK 1'.
However could not see any actual code parsing 'PMID'.

>From pdb.pm, what I see:

sub _read_PDB_jrnl {
...
           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
...
}

sub _read_PDB_remark_1 {
...
               $auth = $self->_concatenate_lines($auth,$rol) if
($subr eq "AUTH");
               $titl = $self->_concatenate_lines($titl,$rol) if
($subr eq "TITL");
               $edit = $self->_concatenate_lines($edit,$rol) if
($subr eq "EDIT");
               $ref  = $self->_concatenate_lines($ref ,$rol) if
($subr eq "REF");
               $publ = $self->_concatenate_lines($publ,$rol) if
($subr eq "PUBL");
               $refn = $self->_concatenate_lines($refn,$rol) if
($subr eq "REFN");
...
}

>From my script, I did:

($struc->annotation->get_Annotations('reference'))[0]->authors
($struc->annotation->get_Annotations('reference'))[0]->title

or

my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
for my $key (keys %{$hash_ref}) {
   print $key,": ",$hash_ref->{$key},"\n";
}

Any plan to include a code chopping 'PMID' out?
Or did I miss something?

Cheers,
Sung


From nml5566 at gmail.com  Wed Dec 16 14:42:57 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Wed, 16 Dec 2009 13:42:57 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
	<F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com>

Actually, yes I did find that and it works very well. Now I'm wondering, is
it possible to search for similar terms using a string instead of a
Bio::Ontology term object? For examle, I'd like to search for the synonym:
"transcription start site" and have it return all similar terms. But, it
throws an error if I pass in a simple query like that.

-Nathan

On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp <hlapp at drycafe.net> wrote:

> That shouldn't happen I suppose, but you're not supposed really to use the
> engine directly. Rather it will be used as a backing parser by the
> Bio::OntologyIO parser you choose. Have you tried that route and found it
> not to work?
>
>        -hilmar
>
>
> On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:
>
>  Is the Bio::Ontology::OBOEngine module working or being currently
>> maintained? I tried following the documentation in the module:
>>
>> * use Bio::Ontology::OBOEngine;
>>
>> my $parser = Bio::Ontology::OBOEngine->new
>>              ( -file => "gene_ontology.obo" );
>>
>> my $engine = $parser->parse();
>>
>> *But, it throws an error when I run the file saying 'Can't locate object
>> method "parse" '. Does anyone have any experience getting this module
>> working; or, is there any alternative bioperl module to extract terms and
>> relationships out of sequence ontology files?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
>


From cjfields1 at gmail.com  Wed Dec 16 19:53:50 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
Message-ID: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>

Howdy from Google Groups


From cjfields1 at gmail.com  Wed Dec 16 20:01:38 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>

I would like to announce (with the tremendous help of Hilmar Lapp) the
creation of a mirror for the BioPerl mail list, if the last post
didn't already give it away.

http://groups.google.com/group/bioperl-l

One can join the group and submit posts via the Google Groups web
interface or via email.  Have fun!

chris


From ocarnorsk138 at gmail.com  Wed Dec 16 20:12:21 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
In-Reply-To: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
References: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com>

testing back from google group!

On Dec 16, 9:53?pm, Chris Fields <cjfiel... at gmail.com> wrote:
> Howdy from Google Groups
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Thu Dec 17 05:50:23 2009
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
Message-ID: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>

On Dec 17, 1:01?am, Chris Fields <cjfiel... at gmail.com> wrote:
> I would like to announce (with the tremendous help of Hilmar Lapp) the
> creation of a mirror for the BioPerl mail list, if the last post
> didn't already give it away.
>
> http://groups.google.com/group/bioperl-l
>
> One can join the group and submit posts via the Google Groups web
> interface or via email. ?Have fun!
>
> chris

Sounds particularly good in the long run (once there is enough of
an archive on Google Groups to make searching there useful).

Does this mean a Google Groups user doesn't have to be subscribed
to the mailing list to post (since the mailing list normally only
allows subscribers to post)?

Peter


From David.Messina at sbc.su.se  Thu Dec 17 07:25:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 17 Dec 2009 13:25:49 +0100
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>

Very nice, Chris and Hilmar! That'll be great.


> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post (since the mailing list normally only
> allows subscribers to post)?


I think that's right. From the Google groups page:

> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.


Dave


From cjfields at illinois.edu  Thu Dec 17 08:21:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 07:21:46 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu>


On Dec 17, 2009, at 6:25 AM, Dave Messina wrote:

> Very nice, Chris and Hilmar! That'll be great.
> 
> 
> 
>> Does this mean a Google Groups user doesn't have to be subscribed
>> to the mailing list to post (since the mailing list normally only
>> allows subscribers to post)?
> 
> 
> I think that's right. From the Google groups page:
> 
>> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.
> 
> 
> 
> 
> Dave

It is moderated by user to deal with spam.  Hilmar's already a manager/co-owner, and either of us can add more as needed.

chris


From hlapp at drycafe.net  Thu Dec 17 09:52:33 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 17 Dec 2009 09:52:33 -0500
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>


On Dec 17, 2009, at 5:50 AM, Peter wrote:

> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post


Yes. They can post through the Google Groups web interface.

The email address for mirrored groups is the one of the list being  
mirrored though, bioperl-l at bioperl.org in this case, and so in order  
to post by email you still have to be subscribed at the bioperl-l  
list. At least that's what the docs at Google say.

I haven't tried yet posting to the group at the bioperl-l at  
googlegroups dot com email under an email address that isn't  
subscribed to bioperl-l at bioperl dot org. Maybe it actually would  
work, contrary to docs.

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From jay at jays.net  Thu Dec 17 12:05:24 2009
From: jay at jays.net (Jay Hannah)
Date: Thu, 17 Dec 2009 11:05:24 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net>

On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote:
> I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs.

In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. 

Here is the configuration set I recommend:

   http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png

Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so.

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From robert.bradbury at gmail.com  Thu Dec 17 14:42:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 14:42:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
Message-ID: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>

Just to close out the issue of bioperl forking (in particular accesses to
external databases through get_sequence) which involves individual database
sub-modules and not collecting its children.

As it turns out the code does do an explicit fork, it looks like so the
child process can read from the database while the parent process
manipulates the data as it becomes available.  Now, one could argue that a
threaded model might be better since now threads are fairly standard OS
tools in current environments.

But I couldn't find any functions which actually wait for the forked process
(presumably because they are created for "future" use).  But nor is there
any indication in the pages I've found in most of the documentation (which
is spread across the web) or Wiki that explain that "creating child
processes" is how these functions work and one *needs* to collect those
children after each use or else zombie processes will accumulate, which on
"reasonable" systems with per-user process limits will create problems for
proper program functioning.  Nor (it would appear) does the parent process
setup a SIGCHLD "catcher" which could collect the processes once they exit
(which I expect in the case of "get_sequence" would be after closing of the
socket which actually fetched the sequence from Genbank.

It can be resolved easily enough by adding a call after each use of these
functions:
   $kid = waitpid(-1, WNOHANG);
But typically, as a programmer, I should not be responsible for having to
clean up the leftovers of library calls (unless said cleanup requirements
are clearly documented).


But to a "newbie" using the functions, coming from a functional background
(C), not an OO background (which at least I would tend to view as a wart on
the otherwise robust Perl language), there are two problems
1. The lack of documentation and examples explaining how the functions work
and how they must be handled at a higher level (by executing explicit wait
system calls).
2. The lack of code in the BioPerl functions to deal with the forked
processes which they create.  Functional programmers have a perspective --
if you create it -- you have to clean it up.  It would appear that in the
transition to OO programming (or perhaps simply for expediency) that detail
was left out of both (either/and) the documentation and the code.  From this
standpoint one could view garbage collectors as being fundamentally evil --
because they gloss over the fact that programmers should know what they are
doing and when they are doing it.

So, everywhere in the documentation where there is a get_sequence call (or
anything which accesses an external database which causes a fork to occur)
there should be a modification as I have outlined above -- or else the code
should be corrected so orphaned children are always collected and not
allowed to accumulate.


From robert.bradbury at gmail.com  Thu Dec 17 15:23:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 15:23:38 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>

Oh, yes, in case it was not clear, the fork calls which fails is in
DB/WebDBSeqI.pm: line 722
     defined(my $pid = fork)
          or $self->throw("'Couldn't fork: $!");

And of course that is because Linux has reached the process limits for the
user (due to accumulated background processes which are uncollected).

And they could be resolved by simply executing a simple waitpid call for
prior orphaned children before forking [1] But such a succinct solution
would violate "functional" programming rules -- clean up what you create --
instead they would tend to fall into the OO camp -- "Oh don't worry the
garbage collector will take care of it".  Green programming is a little less
cavalier.

Robert

1. IMO, a very very real problem with programming today is that there is no
connection between programmers and the cost of their programs.  How many
programmers know the instruction cycle time of their computers, what does an
instruction cost in terms of W consumed, W wasted (heat generation),
fruitless scanning over uncollected zombie processes, etc.  It may be that
only that programmers who grew up in the era when CPU cycles were expensive
(300 ns/cycle) who know what each instruction required in terms of cycles
consider these perspectives.  Now things (cpu use, processor use, etc) tend
to be swept under the rug and it appears that that is the case with the
standard implementation of bioper.  The documentation does not clearly state
that additional sub-processes may be created and need to be collected.  You
are providing a utility that only works "this much".  And guess what -- I
happen to have run into the "this".


From cjfields at illinois.edu  Thu Dec 17 15:25:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:25:56 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <BFDD2A52-FB3D-4CC4-A5BF-C53A3DAC9C41@illinois.edu>

Robert,

I have previously outlined specifically why you are seeing the fork issue, and a possible solution.  IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast.  Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank.  Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime.

Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X.  We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution.

My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'.  What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so.  The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect).  Please keep that in mind.

chris

On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote:

> Just to close out the issue of bioperl forking (in particular accesses to
> external databases through get_sequence) which involves individual database
> sub-modules and not collecting its children.
> 
> As it turns out the code does do an explicit fork, it looks like so the
> child process can read from the database while the parent process
> manipulates the data as it becomes available.  Now, one could argue that a
> threaded model might be better since now threads are fairly standard OS
> tools in current environments.
> 
> But I couldn't find any functions which actually wait for the forked process
> (presumably because they are created for "future" use).  But nor is there
> any indication in the pages I've found in most of the documentation (which
> is spread across the web) or Wiki that explain that "creating child
> processes" is how these functions work and one *needs* to collect those
> children after each use or else zombie processes will accumulate, which on
> "reasonable" systems with per-user process limits will create problems for
> proper program functioning.  Nor (it would appear) does the parent process
> setup a SIGCHLD "catcher" which could collect the processes once they exit
> (which I expect in the case of "get_sequence" would be after closing of the
> socket which actually fetched the sequence from Genbank.
> 
> It can be resolved easily enough by adding a call after each use of these
> functions:
>   $kid = waitpid(-1, WNOHANG);
> But typically, as a programmer, I should not be responsible for having to
> clean up the leftovers of library calls (unless said cleanup requirements
> are clearly documented).
> 
> 
> But to a "newbie" using the functions, coming from a functional background
> (C), not an OO background (which at least I would tend to view as a wart on
> the otherwise robust Perl language), there are two problems
> 1. The lack of documentation and examples explaining how the functions work
> and how they must be handled at a higher level (by executing explicit wait
> system calls).
> 2. The lack of code in the BioPerl functions to deal with the forked
> processes which they create.  Functional programmers have a perspective --
> if you create it -- you have to clean it up.  It would appear that in the
> transition to OO programming (or perhaps simply for expediency) that detail
> was left out of both (either/and) the documentation and the code.  From this
> standpoint one could view garbage collectors as being fundamentally evil --
> because they gloss over the fact that programmers should know what they are
> doing and when they are doing it.
> 
> So, everywhere in the documentation where there is a get_sequence call (or
> anything which accesses an external database which causes a fork to occur)
> there should be a modification as I have outlined above -- or else the code
> should be corrected so orphaned children are always collected and not
> allowed to accumulate.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Dec 17 15:29:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:29:10 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
	<deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
Message-ID: <FF6F8AAD-FBBE-4FAD-BB88-59A779CC7131@illinois.edu>

On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote:

> Oh, yes, in case it was not clear, the fork calls which fails is in
> DB/WebDBSeqI.pm: line 722
>     defined(my $pid = fork)
>          or $self->throw("'Couldn't fork: $!");

Okay, that's a bit more helpful.

> And of course that is because Linux has reached the process limits for the
> user (due to accumulated background processes which are uncollected).

Right, but again, we need to check this in a cross-platform compatible way.

> And they could be resolved by simply executing a simple waitpid call for
> prior orphaned children before forking [1] But such a succinct solution
> would violate "functional" programming rules -- clean up what you create --
> instead they would tend to fall into the OO camp -- "Oh don't worry the
> garbage collector will take care of it".  Green programming is a little less
> cavalier.
> 
> Robert
> 
> 1. IMO, a very very real problem with programming today is that there is no
> connection between programmers and the cost of their programs.  How many
> programmers know the instruction cycle time of their computers, what does an
> instruction cost in terms of W consumed, W wasted (heat generation),
> fruitless scanning over uncollected zombie processes, etc.  It may be that
> only that programmers who grew up in the era when CPU cycles were expensive
> (300 ns/cycle) who know what each instruction required in terms of cycles
> consider these perspectives.  Now things (cpu use, processor use, etc) tend
> to be swept under the rug and it appears that that is the case with the
> standard implementation of bioper.  The documentation does not clearly state
> that additional sub-processes may be created and need to be collected.  You
> are providing a utility that only works "this much".  And guess what -- I
> happen to have run into the "this".

Um, yeah.  Okay.

chris


From robfsouza at gmail.com  Fri Dec 18 13:07:34 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Fri, 18 Dec 2009 13:07:34 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
Message-ID: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>

Hi,

I've been dealing with an apparent bug in the output of NCBI's BLAST
programs (blastall, blastpgp) which sometimes produces output like the
one below.
I think I've managed to produce a work around for Bioperl blast.pm
parser and would like to contribute it to Bioperl.
The fix is based on blast.pm from the CVS tree (downloaded some months
ago...) and is attached to this message.
Best,
Robson

PS: what happened to the bioperl-bugs mailing list? It does not seem
to be working...

>gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
? ? ? ? ? hypothetical protein [Nasonia vitripennis]
? ? ? ? ?Length = 1774

?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust.
?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)

Query: 0 ? -

Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654
? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? +
Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376

Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++
Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432

Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF
Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491

Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++
Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548

Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L
Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602

Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E
Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661

Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E
Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blast_patched.pm
Type: application/octet-stream
Size: 91820 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091218/3771d91c/attachment-0003.obj>

From cjfields at illinois.edu  Fri Dec 18 13:33:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 18 Dec 2009 12:33:44 -0600
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <DC79216C-9DD8-47AE-876F-7BBAEC6C43CB@illinois.edu>

Robson, 

Any chance you could check this against SVN?  We haven't used the CVS tree for a few years (had a number of releases along the way as well).

Not sure about bioperl-bugs, we have bugzilla still running though:

http://bugzilla.open-bio.org/

chris


On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote:

> Hi,
> 
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson
> 
> PS: what happened to the bioperl-bugs mailing list? It does not seem
> to be working...
> 
>> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
>           hypothetical protein [Nasonia vitripennis]
>          Length = 1774
> 
>  Score = 75.9 bits (185), Expect = 1e-11,   Method: Compositional matrix adjust.
>  Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)
> 
> Query: 0   -
> 
> Sbjct: 328 P                                                            328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG             654
>            P PP +   + P       KTK+      K+P  K         +
> Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA             376
> 
> Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
>           ++  N  +    W  +     +++  +   N    NN       D   +E    PT ++
> Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432
> 
> Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
>           LD K S  + + L   + +  +I + + D    ++  + +  L  + PE D+ + ++SF
> Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491
> 
> Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
>              DG   +L   +K F  +  +P  K R      +  F  ++  +EP I S+  A +++
> Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548
> 
> Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
>           +  KSLQ ++ ++++  NFLN      +   G KL+ L KL +I++    N+  MN L
> Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602
> 
> Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
>           ++  + ++   +LL   +  +  +  ++  + +L  E   L+  +K I+++++    E
> Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661
> 
> Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
>                  +Q+ +F Q A+ EM ++ +  E+L+ + + +A+FF E
> Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
> <blast_patched.pm>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Fri Dec 18 18:00:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 18 Dec 2009 23:00:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>

On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
> Hi,
>
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson

Do you have a complete example of this kind of funny output?
This problem has also been reported with blastpgp for the
Biopython parser. I'd love an example for our unit tests
(probably worth doing in BioPerl too). Could you upload a
test case here?:

http://bugzilla.open-bio.org/show_bug.cgi?id=2927

Thanks!

Peter @ Biopython


From biopython at maubp.freeserve.co.uk  Sat Dec 19 06:19:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 19 Dec 2009 11:19:53 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Thank you,

Peter


From maj at fortinbras.us  Sat Dec 19 14:52:45 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 19 Dec 2009 14:52:45 -0500
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
Message-ID: <F7E9AD08646A44D3AB29A4504A725095@NewLife>

Hi All, 

Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
is at beta in the bioperl-run trunk. It wraps all the programs of the 
NCBI's new blast+-2.2.22 suite 
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
and integrates them, allowing you to create, mask, and query 
databases from within a single factory object. See the HOWTO
http://www.bioperl.org/wiki/HOWTO:BlastPlus
for the usual usage and implementation details.

Happy coding--
MAJ 


From David.Messina at sbc.su.se  Sat Dec 19 15:34:10 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 19 Dec 2009 21:34:10 +0100
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se>

Sweet! Thanks, Mark.


Dave


From cjfields at illinois.edu  Sat Dec 19 17:44:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 16:44:46 -0600
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu>

Very nice!  We'll definitely give it a try here (along with the requisite feedback, of course).

chris

On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote:

> Hi All, 
> 
> Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
> is at beta in the bioperl-run trunk. It wraps all the programs of the 
> NCBI's new blast+-2.2.22 suite 
> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> and integrates them, allowing you to create, mask, and query 
> databases from within a single factory object. See the HOWTO
> http://www.bioperl.org/wiki/HOWTO:BlastPlus
> for the usual usage and implementation details.
> 
> Happy coding--
> MAJ 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Sat Dec 19 23:59:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 22:59:38 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
	<6723123C0ABD447190639AE1F5D1A6A7@NewLife>
Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu>

I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1).  

I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else).

chris

On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote:

> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Monday, December 14, 2009 8:23 PM
> Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
> 
> 
>> All,
>> 
>> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:
>> 
>> 1) Stockholm Rfam reverses start and end if the strand == -1
>> 
>>  chrY/598-1
>> 
>> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>> 
>>  rice-3(+)/16598648-16600199
>> 
>> The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.osimo at gmail.com  Sun Dec 20 13:19:37 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Sun, 20 Dec 2009 19:19:37 +0100
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>

Hello everyone,
I have a very particular problem: I'd like to draw in a single track
different SNPs with a glyph that allows me to see graphically their
importance.
For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
first depicted small, and the last one big, with the ones in between with
according sizes.
I'd be satisfied also with a color gradient.
What I cannot do is to set the option -height , for example, instead than in
the add_track section, in the Bio::SeqFeature::Generic->new that I use for
each of my objects.
If I set it in the add_track section, all the glyphs are then of the same
size (or color).
If, otherwise, I add a different track for each object, my picture becomes
too big.

Please, help!
Thanks
Emanuele


From ajmackey at gmail.com  Sun Dec 20 13:41:14 2009
From: ajmackey at gmail.com (Aaron Mackey)
Date: Sun, 20 Dec 2009 13:41:14 -0500
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com>

You can set the height as a callback sub, rather than a constant -- the
callback will get passed the feature about to be drawn, from which you can
calculate the "importance", and return the desired height, dynamically.

-Aaron

On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo <e.osimo at gmail.com> wrote:

> Hello everyone,
> I have a very particular problem: I'd like to draw in a single track
> different SNPs with a glyph that allows me to see graphically their
> importance.
> For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
> first depicted small, and the last one big, with the ones in between with
> according sizes.
> I'd be satisfied also with a color gradient.
> What I cannot do is to set the option -height , for example, instead than
> in
> the add_track section, in the Bio::SeqFeature::Generic->new that I use for
> each of my objects.
> If I set it in the add_track section, all the glyphs are then of the same
> size (or color).
> If, otherwise, I add a different track for each object, my picture becomes
> too big.
>
> Please, help!
> Thanks
> Emanuele
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From robfsouza at gmail.com  Sat Dec 19 06:06:16 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 19 Dec 2009 06:06:16 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
Message-ID: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>

Hi Peter,

I just upload my example. I also reported this bug to the NCBI
developers and I hope they can fix it, since it is easy to reproduce.
I just forgot to mention the blastpgp version: 2.2.18
Best,
Robson

On Fri, Dec 18, 2009 at 6:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
> <robfsouza at gmail.com> wrote:
>> Hi,
>>
>> I've been dealing with an apparent bug in the output of NCBI's BLAST
>> programs (blastall, blastpgp) which sometimes produces output like the
>> one below.
>> I think I've managed to produce a work around for Bioperl blast.pm
>> parser and would like to contribute it to Bioperl.
>> The fix is based on blast.pm from the CVS tree (downloaded some months
>> ago...) and is attached to this message.
>> Best,
>> Robson
>
> Do you have a complete example of this kind of funny output?
> This problem has also been reported with blastpgp for the
> Biopython parser. I'd love an example for our unit tests
> (probably worth doing in BioPerl too). Could you upload a
> test case here?:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2927
>
> Thanks!
>
> Peter @ Biopython
>


From biopython at maubp.freeserve.co.uk  Mon Dec 21 10:27:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 15:27:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Hi again Robson,

Having a reproducible example to investigate this issue is
incredibly helpful - thank you!

I've been looking at the output, and while I can make sense of
it "by hand", it would be very tricky to try and parse as a special
case. It really does look like a bug in BLAST to me. The alignment
includes an initial pair, a leading gap in the query (with a coordinate
of zero), plus a residue from the match sequence (with a sensible
coordinate). The alignment statistics include this (extra) pair in
the alignment length.

You said you were using blastpgp version 2.2.18, so I tried this
with the latest (final?) version of the "legacy" BLAST suite,
blastpgp 2.2.22, which I already had installed. It looks like my
copy of NR is more recent (bigger), but the same odd output
was produced:

blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000

I also tried what I think would be the equivalent command line
on the new BLAST+ suite, using psiblast 2.2.22+ like this:

psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast
-num_threads 8 -parse_deflines -num_alignments 10000

This was much faster, and seems to output sensible alignments.

I might therefore expect the NCBI so say "yes, this is a bug in
the old blastpgp tool, just use the new psiblast tool instead".
However,  fingers crossed they will do another maintenance
release of the "legacy" BLAST suite and fix this in blastpgp.

Have you had any reply from the NCBI? Admittedly it is almost
Christmas/New Year so we may not expect an answer until Jan.

Peter


From maj at fortinbras.us  Mon Dec 21 13:52:01 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 13:52:01 -0500
Subject: [Bioperl-l] test fail
Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife>

fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)

t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
#          got: '1..4'
#     expected: 'complement(5..8)'

t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
#          got: 'complement(5..8)'
#     expected: '1..4'
# Looks like you failed 2 tests of 51.

MAJ


From cjfields at illinois.edu  Mon Dec 21 14:20:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 13:20:32 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
Message-ID: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>

Saw that from the other day (LocatableSeq commit).  I'll check it out.

chris

On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:

> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
> 
> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
> #          got: '1..4'
> #     expected: 'complement(5..8)'
> 
> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
> #          got: 'complement(5..8)'
> #     expected: '1..4'
> # Looks like you failed 2 tests of 51.
> 
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Mon Dec 21 15:02:20 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 21 Dec 2009 15:02:20 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>

Hi All,

Today it was pointed out to me that the Bio::Graphics documentation
links on the BioPerl wiki are broken, no doubt because Bio::Graphics
is no longer part of bioperl-core (is that how it should be referred
to?).  Anyway, the question is: what is the right way to rectify this
problem?  Since other things may get broken out in the future, I
suppose we should get some sort of standard established.  Can a
release of Bio::Graphics be placed somewhere on the BioPerl wiki
server to be processed?

Thanks,
Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Mon Dec 21 15:22:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 14:22:39 -0600
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>

We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links.  Shouldn't be too hard to do.

chris

On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:

> Hi All,
> 
> Today it was pointed out to me that the Bio::Graphics documentation
> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
> is no longer part of bioperl-core (is that how it should be referred
> to?).  Anyway, the question is: what is the right way to rectify this
> problem?  Since other things may get broken out in the future, I
> suppose we should get some sort of standard established.  Can a
> release of Bio::Graphics be placed somewhere on the BioPerl wiki
> server to be processed?
> 
> Thanks,
> Scott
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Dec 21 16:12:45 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 15:12:45 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
	<E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
Message-ID: <A396F39A-76BC-44B4-8302-4C622257E6ED@illinois.edu>

T'was a bad test call.  I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly.

chris

On Dec 21, 2009, at 1:20 PM, Chris Fields wrote:

> Saw that from the other day (LocatableSeq commit).  I'll check it out.
> 
> chris
> 
> On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:
> 
>> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
>> 
>> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
>> #          got: '1..4'
>> #     expected: 'complement(5..8)'
>> 
>> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
>> #          got: 'complement(5..8)'
>> #     expected: '1..4'
>> # Looks like you failed 2 tests of 51.
>> 
>> MAJ
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Dec 21 16:27:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:27:25 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife>

I've modified Template:Doclink ; if you now do

{{Doclink|Bio::Graphics|cpan}}

you'll get a page with only the cpan link.

{{Doclink|Bio::SeqIO}}

etc. works as usual.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Dec 21 16:34:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:34:40 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife>

Also, applied the new Doclink to Bio::Graphics on wiki.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Dec 21 21:51:32 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 21:51:32 -0500
Subject: [Bioperl-l] pdb.pm and annotations
In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife>

Hi Sung--

We didn't plan it, but we added it anyway: see revision 16559 of 
bioperl-live/trunk.
You can then do
$pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed;
and even
$doi = ($struct->annotation->get_Annotations('reference'))[0]->doi;

Thanks for the heads-up!
cheers,
MAJ
----- Original Message ----- 
From: "Sungsam Gong" <sung at bio.cc>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 16, 2009 12:55 PM
Subject: [Bioperl-l] pdb.pm and annotations


> Hi,
>
> Wanted to get pubmed identifier from a PDB file using Bio::Structure,
> so hacked the code.
> Knew that Bio::Structure::IO::pdb.pm get relevant info from either
> 'JRNL' or 'REMARK 1'.
> However could not see any actual code parsing 'PMID'.
>
>>From pdb.pm, what I see:
>
> sub _read_PDB_jrnl {
> ...
>           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
>           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
>           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
>           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
>           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
>           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
> ...
> }
>
> sub _read_PDB_remark_1 {
> ...
>               $auth = $self->_concatenate_lines($auth,$rol) if
> ($subr eq "AUTH");
>               $titl = $self->_concatenate_lines($titl,$rol) if
> ($subr eq "TITL");
>               $edit = $self->_concatenate_lines($edit,$rol) if
> ($subr eq "EDIT");
>               $ref  = $self->_concatenate_lines($ref ,$rol) if
> ($subr eq "REF");
>               $publ = $self->_concatenate_lines($publ,$rol) if
> ($subr eq "PUBL");
>               $refn = $self->_concatenate_lines($refn,$rol) if
> ($subr eq "REFN");
> ...
> }
>
>>From my script, I did:
>
> ($struc->annotation->get_Annotations('reference'))[0]->authors
> ($struc->annotation->get_Annotations('reference'))[0]->title
>
> or
>
> my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
> for my $key (keys %{$hash_ref}) {
>   print $key,": ",$hash_ref->{$key},"\n";
> }
>
> Any plan to include a code chopping 'PMID' out?
> Or did I miss something?
>
> Cheers,
> Sung
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From dan.kortschak at adelaide.edu.au  Mon Dec 21 22:24:04 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 13:54:04 +1030
Subject: [Bioperl-l] call for help and comments on module
Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>

Hi,

I've been working on a Bio::Tools::Run module to handle the bowtie rapid
alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
bioperl-run tree).

I have 90% of what I want included in the module and would like some
advice from more experienced bioperlers. Feedback on approach is also
welcomed (this is my first significant wrapper, and after a long gap
from writing module, so I am rusty). The module has ended up being
significantly more complicated than I had hoped.

There are a few issues I'm having, so I apologise for the list:

     1. Informal tests run correctly (outside the t/ tree and Test
        harness), but formal Test harness tests fail for reasons I
        cannot understand. (The module is still lacking a lot of tests,
        but since things were failing in the harness I have placed them
        as a lower priority and have been working to my micro-script
        tests - yes, bad form.
     2. I am having a big problem with IPC::Run for one of the
        executables (the module can call 5 different excutables for 7
        commands), bowtie-maptool (module command 'map'). All the other
        commands tested (this excludes bowtie-maqconvert [convert
        command]) work fine, but maptool fails with an illegal seek -
        presumably due to the redirection handling? I have no idea how
        to resolve this, so help would be greatly appreciated (a small
        script that demonstrates the use that results in the failure is
        below).

There will be provision for returning a Bio::Assembly::IO object through
samtools in the finished module, but currently the
Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.

Thanks for any help.
Dan


#!/usr/bin/perl

use strict;
use warnings;

use Bio::Tools::Run::Bowtie;

# These files are in the bioperl-run t/data/ tree
my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';

my $bowtiefac = Bio::Tools::Run::Bowtie->new(
	-command             => 'single',
	-max_seed_mismatches => 2,
	-seed_length         => 28,
	-max_qual_mismatch   => 70,
	-sam_format          => 0
	);

my $align = $bowtiefac->run($rdq,$refseq); # this runs fine

my $bowtiemap = Bio::Tools::Run::Bowtie->new(
	-command             => 'map'
	);

my $map = $bowtiemap->run($align); # throws Illegal seek

print "$map\n";

open (IN,$map);
	my $lines =(my @lines)= <IN>;
	print @lines;
	print "\n\n$lines\n";
close IN;


From maj at fortinbras.us  Tue Dec 22 00:19:35 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 22 Dec 2009 00:19:35 -0500
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <F7513FBADF944B51823A5F22FFA85911@NewLife>

Hey Dan, 
It looks like if the outfile isn't specified on the commandline for
maptool, then the align is written to stdout. So, you could 
try this workaround in in Bowtie/Config.pm:

our %command_files = (
    'single'     => [qw( ind seq #out )],
    'paired'     => [qw( ind seq seq2 #out )],
    'crossbow'   => [qw( ind seq #out )],
    'build'      => [qw( ref out )],
    'inspect'    => [qw( ind >#out )],
    'convert'    => [qw( bwt out bfa )],
-    'map'        => [qw( bwt #out )]
+    'map'        => [qw( bwt >#out )]
    );

which should be transparent to the user. If this works, then
there is probably something funky going on with IPC::Run
+ maptool; if it doesn't, then the funkiness is prob. in my code.

I notice, however, that both bowtie-maptool and bowtie-maqconvert
have been removed from the 0.12.0-beta release 
(http://bowtie-bio.sourceforge.net/index.shtml)...

cheers MAJ

----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 10:24 PM
Subject: [Bioperl-l] call for help and comments on module


> Hi,
> 
> I've been working on a Bio::Tools::Run module to handle the bowtie rapid
> alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
> bioperl-run tree).
> 
> I have 90% of what I want included in the module and would like some
> advice from more experienced bioperlers. Feedback on approach is also
> welcomed (this is my first significant wrapper, and after a long gap
> from writing module, so I am rusty). The module has ended up being
> significantly more complicated than I had hoped.
> 
> There are a few issues I'm having, so I apologise for the list:
> 
>     1. Informal tests run correctly (outside the t/ tree and Test
>        harness), but formal Test harness tests fail for reasons I
>        cannot understand. (The module is still lacking a lot of tests,
>        but since things were failing in the harness I have placed them
>        as a lower priority and have been working to my micro-script
>        tests - yes, bad form.
>     2. I am having a big problem with IPC::Run for one of the
>        executables (the module can call 5 different excutables for 7
>        commands), bowtie-maptool (module command 'map'). All the other
>        commands tested (this excludes bowtie-maqconvert [convert
>        command]) work fine, but maptool fails with an illegal seek -
>        presumably due to the redirection handling? I have no idea how
>        to resolve this, so help would be greatly appreciated (a small
>        script that demonstrates the use that results in the failure is
>        below).
> 
> There will be provision for returning a Bio::Assembly::IO object through
> samtools in the finished module, but currently the
> Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.
> 
> Thanks for any help.
> Dan
> 
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use Bio::Tools::Run::Bowtie;
> 
> # These files are in the bioperl-run t/data/ tree
> my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
> my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';
> 
> my $bowtiefac = Bio::Tools::Run::Bowtie->new(
> -command             => 'single',
> -max_seed_mismatches => 2,
> -seed_length         => 28,
> -max_qual_mismatch   => 70,
> -sam_format          => 0
> );
> 
> my $align = $bowtiefac->run($rdq,$refseq); # this runs fine
> 
> my $bowtiemap = Bio::Tools::Run::Bowtie->new(
> -command             => 'map'
> );
> 
> my $map = $bowtiemap->run($align); # throws Illegal seek
> 
> print "$map\n";
> 
> open (IN,$map);
> my $lines =(my @lines)= <IN>;
> print @lines;
> print "\n\n$lines\n";
> close IN;
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From dan.kortschak at adelaide.edu.au  Tue Dec 22 00:51:30 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 16:21:30 +1030
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <F7513FBADF944B51823A5F22FFA85911@NewLife>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
	<F7513FBADF944B51823A5F22FFA85911@NewLife>
Message-ID: <1261461090.4411.13.camel@epistle>

Hi Mark,

maptool either outputs to stdout or a specified file - I chose to use a
specified file and run it that way, but I've tried the redirect a you
suggest, with the same failure result. I think it's a strangeness of
maptool (which may well be a reason for it being dropped - also note the
maptool output doesn't seem reasonable for the test data provided even
when run from the command line).

It's probably a result of difficult interaction between IPC::Run and
maptool. Any funkiness in your code is not likely to be a cause as I've
deeply analysed what is being passed to IPC::Run, and I've quite
extensively modified the IPC run handling method from your code to take
into account the differences between a single executable with many
commands as the base code managed from a cluster of executables each
taking a small subset of different filespecs as bowtie needs. My
funkiness will undoubtedly swamp yours.

Resolution: Will drop bowtie-maptool from module.

(Should test maqconvert - if it fails, this will be dropped also unless
someone asks otherwise).

When the module copes with 0.11.* properly I'll start thinking about
0.12.* which has colourspace handling to deal with.

cheers
Dan

On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote:
> Hey Dan, 
> It looks like if the outfile isn't specified on the commandline for
> maptool, then the align is written to stdout. So, you could 
> try this workaround in in Bowtie/Config.pm:
> 
> our %command_files = (
>     'single'     => [qw( ind seq #out )],
>     'paired'     => [qw( ind seq seq2 #out )],
>     'crossbow'   => [qw( ind seq #out )],
>     'build'      => [qw( ref out )],
>     'inspect'    => [qw( ind >#out )],
>     'convert'    => [qw( bwt out bfa )],
> -    'map'        => [qw( bwt #out )]
> +    'map'        => [qw( bwt >#out )]
>     );
> 
> which should be transparent to the user. If this works, then
> there is probably something funky going on with IPC::Run
> + maptool; if it doesn't, then the funkiness is prob. in my code.
> 
> I notice, however, that both bowtie-maptool and bowtie-maqconvert
> have been removed from the 0.12.0-beta release 
> (http://bowtie-bio.sourceforge.net/index.shtml)...
> 
> cheers MAJ


From lovebaby39 at gmail.com  Wed Dec 23 05:48:55 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Wed, 23 Dec 2009 18:48:55 +0800
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>

Dear all

I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how 
to get "P.pastoris DNA for pPIC9K expression vector".

    while (my $result_u =  $blast_report_u-> next_result ) {
        while (my $hit_u = $result_u->next_hit()){
            while (my $hsp_u = $hit_u->next_hsp()){
                    $hit_u->name;
                    $hsp_u->evalue;
                    $hsp_u->score;
            }
        }
    }

I will appreciate if you could tell me how to do it.

P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download 
link?)


The flow is BLAST result:
-------------------------------------------------------------------------------------------------------------------------------------
BLASTN 2.2.16 [Mar-25-2007]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
Query=
         (458 letters)

Database: UniVec (build 4.0)
           2416 sequences; 597,480 total letters
Searching..................................................done
                                                                             
                                        Score    E
Sequences producing significant alignments: 
(bits)     Value

gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 
26   3.1
gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 
26   3.1
gnl|uv|U13843.1:1887-9923 pBPV cloning vector 
26   3.1

>gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
          Length = 2781

 Score = 26.3 bits (13), Expect = 3.1
 Identities = 13/13 (100%)
 Strand = Plus / Plus

Query: 352  tactaccgccatt 364
            |||||||||||||
Sbjct: 2209 tactaccgccatt 2221
-------------------------------------------------------------------------------------------------------------------------------------

Reginald Hsueh 


From hrh at fmi.ch  Wed Dec 23 10:14:06 2009
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Wed, 23 Dec 2009 16:14:06 +0100
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>
Message-ID: <C757F24E.5FE2%hrh@fmi.ch>

Hi

Assuming you are using "SearchIO", try:

$hit_u->description

for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO


Regards, Hans


On 12/23/09 11:48 AM, "Hsueh" <lovebaby39 at gmail.com> wrote:

> Dear all
> 
> I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how
> to get "P.pastoris DNA for pPIC9K expression vector".
> 
>     while (my $result_u =  $blast_report_u-> next_result ) {
>         while (my $hit_u = $result_u->next_hit()){
>             while (my $hsp_u = $hit_u->next_hsp()){
>                     $hit_u->name;
>                     $hsp_u->evalue;
>                     $hsp_u->score;
>             }
>         }
>     }
> 
> I will appreciate if you could tell me how to do it.
> 
> P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download
> link?)
> 
> 
> 
> The flow is BLAST result:
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> BLASTN 2.2.16 [Mar-25-2007]
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> Query=
>          (458 letters)
> 
> Database: UniVec (build 4.0)
>            2416 sequences; 597,480 total letters
> Searching..................................................done
>                  
>                                         Score    E
> Sequences producing significant alignments:
> (bits)     Value
> 
> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve...
> 26   3.1
> gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo
> 26   3.1
> gnl|uv|U13843.1:1887-9923 pBPV cloning vector
> 26   3.1
> 
>> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
>           Length = 2781
> 
>  Score = 26.3 bits (13), Expect = 3.1
>  Identities = 13/13 (100%)
>  Strand = Plus / Plus
> 
> Query: 352  tactaccgccatt 364
>             |||||||||||||
> Sbjct: 2209 tactaccgccatt 2221
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> 
> Reginald Hsueh 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 13:36:49 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 12:36:49 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
Message-ID: <200912231236490784820@gmail.com>

Hi Everyone,

I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 

I attached my CODEML outputs here to see whether you guys have some idea. 

Many thanks ahead!
 				
Best regards,
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.1
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0012.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.1
Type: application/octet-stream
Size: 11635 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0013.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.3b
Type: application/octet-stream
Size: 11330 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0014.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.3b
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0015.obj>

From cjfields at illinois.edu  Wed Dec 23 16:19:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 23 Dec 2009 15:19:48 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231236490784820@gmail.com>
References: <200912231236490784820@gmail.com>
Message-ID: <B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>

Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.

Can you file a bioperl bug report for this?  It's the best place to keep track.

http://bugzilla.open-bio.org/

chris

On Dec 23, 2009, at 12:36 PM, pkuonline wrote:

> Hi Everyone,
> 
> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
> 
> I attached my CODEML outputs here to see whether you guys have some idea. 
> 
> Many thanks ahead!
> 				
> Best regards,
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 17:45:54 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 16:45:54 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
Message-ID: <200912231645536094087@gmail.com>

Hi Chris,

Thanks for your reply and I just submitted this bug to bugzilla. 

Have a nice holiday!
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago

>-------------------------------------------------------------
>From: Chris Fields
>Time: 2009-12-23 15:19:50
>To: pkuonline  bioperl-l
>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1

>Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.
>
>Can you file a bioperl bug report for this?  It's the best place to keep track.
>
>http://bugzilla.open-bio.org/
>
>chris
>
>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>
>> Hi Everyone,
>> 
>> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
>> 
>> I attached my CODEML outputs here to see whether you guys have some idea. 
>> 
>> Many thanks ahead!
>> 				
>> Best regards,
>> -------------------------------------------------------------
>> Yong Zhang
>> Ph.D, Research Scholar
>> Manyuan Long's Lab
>> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From David.Messina at sbc.su.se  Wed Dec 23 18:23:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 24 Dec 2009 00:23:44 +0100
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se>

Hi Yong,

Could you attach your codeml output to the bug report, too?

I'll take a look at this as soon as I can.


Dave


From maj at fortinbras.us  Thu Dec 24 00:47:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 24 Dec 2009 00:47:10 -0500
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife>

Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ
----- Original Message ----- 
From: "pkuonline" <pkuonline at gmail.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "bioperl-l" <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 23, 2009 5:45 PM
Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1


> Hi Chris,
>
> Thanks for your reply and I just submitted this bug to bugzilla.
>
> Have a nice holiday!
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago
>
>>-------------------------------------------------------------
>>From: Chris Fields
>>Time: 2009-12-23 15:19:50
>>To: pkuonline  bioperl-l
>>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
>
>>Well, not completely unexpected, but very frustrating nonetheless.  Changes to 
>>PAML output have broken in just about every PAML parser revision.  Not sure 
>>when this will be addressed unfortunately, my hope is sooner than later.
>>
>>Can you file a bioperl bug report for this?  It's the best place to keep 
>>track.
>>
>>http://bugzilla.open-bio.org/
>>
>>chris
>>
>>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>>
>>> Hi Everyone,
>>>
>>> I used the latest Bioperl build, 
>>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to 
>>> parse CODEML result. I searched the mail list and found current PAML parser 
>>> is compatible with PAML 4.3a, 
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. 
>>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser 
>>> does not work. More strangely, I tested it on the old PAML 4.1 result and 
>>> also failed.
>>>
>>> I attached my CODEML outputs here to see whether you guys have some idea.
>>>
>>> Many thanks ahead!
>>>
>>> Best regards,
>>> -------------------------------------------------------------
>>> Yong Zhang
>>> Ph.D, Research Scholar
>>> Manyuan Long's Lab
>>> University of 
>>> Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 


From bhakti.dwivedi at gmail.com  Fri Dec 25 21:46:51 2009
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Fri, 25 Dec 2009 21:46:51 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
Message-ID: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>

Hi,

Does anyone know how to retrieve the "Source" or the "Species name" given
the accession number using Bioperl.   I have these 30,000 accession numbers
for which I need to get the source organisms.  Any kind of help will be
appreciated.

Thanks

BD


From maj at fortinbras.us  Fri Dec 25 22:52:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 25 Dec 2009 22:52:10 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>

Bhakti,
The following example (using EUtilities) may serve your purpose:

use Bio::DB::EUtilities;

my (%taxa, @taxa);
my (%names, %idmap);

# these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
# (probably)

my @ids = qw(1621261 89318838 68536103 20807972 730439);

my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
                                       -db => 'taxonomy',
                                       -dbfrom => 'protein',
                                       -correspondence => 1,
                                       -id => \@ids);

# iterate through the LinkSet objects
while (my $ds = $factory->next_LinkSet) {
    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
}

@taxa = @taxa{@ids};

$factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
        -db    => 'taxonomy',
        -id    => \@taxa );

while (local $_ = $factory->next_DocSum) {
    $names{($_->get_contents_by_name('TaxId'))[0]} = 
($_->get_contents_by_name('ScientificName'))[0];
}

foreach (@ids) {
    $idmap{$_} = $names{$taxa{$_}};
}

# %idmap is
#    1621261 => 'Mycobacterium tuberculosis H37Rv'
#    20807972 => 'Thermoanaerobacter tengcongensis MB4'
#    68536103 => 'Corynebacterium jeikeium K411'
#    730439 => 'Bacillus caldolyticus'
#    89318838 => undef    (this record has been removed from the db)

1;

You probably will need to break up your 30000 into chunks
(say, 1000-3000 each), and do the above on each chunk with a

sleep 3;

or so separating the queries.
MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 25, 2009 9:46 PM
Subject: [Bioperl-l] how to retrieve organism name from accession number?


> Hi,
>
> Does anyone know how to retrieve the "Source" or the "Species name" given
> the accession number using Bioperl.   I have these 30,000 accession numbers
> for which I need to get the source organisms.  Any kind of help will be
> appreciated.
>
> Thanks
>
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Sat Dec 26 06:47:29 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 26 Dec 2009 05:47:29 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <AD7C8B9A-61D1-443C-952E-BC7C66E398B2@illinois.edu>


On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote:

> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> ...
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ

The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec.

chris


From arpm9 at charter.net  Sun Dec 27 16:42:09 2009
From: arpm9 at charter.net (arpm9)
Date: Sun, 27 Dec 2009 16:42:09 -0500
Subject: [Bioperl-l]  Should Bio::Tools::BPlite be deprecated?
In-Reply-To: 4533A8D3.90709@sendu.me.uk
Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9>

hi chris,
 I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm


From pengyu.ut at gmail.com  Tue Dec 29 11:08:09 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 10:08:09 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>

May I ask somebody who are versitile in both bioperl and biopython
comment on the pros and cons of bioperl and biopython? I'm sending
this email to both bioperl and biopython mailing lists. But I hope
that it will not result in any contention.

I assume that the functionality between bioperl or biopython is the
same, i.e., tasks can be done in bioperl can be done biopython and
vice versa, as both libraries have been out there over 10 years.
Please correct me if my understanding is not true.

Given that a task that can be done with either bioperl or biopython,
I, in particularly, want to know how long it will take to write the
code for the task in bioperl and biopython, with the same readability
requirement (see below) and the assumption that users have the same
fluency in perl and python.

python is claimed to be good for maintainability. But perl is
criticized for there-are-many-ways-for-a-given-task. Since there are
multiple ways in perl, let us assume that we always use perl in a
readable way.


From jason at bioperl.org  Tue Dec 29 11:49:20 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 08:49:20 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>

Are you asking for the purposes of choosing a toolkit for your work or  
just curious about the advantages/disadvantages of language choice?

-jason
On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:

> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
>
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
>
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From ak at ebi.ac.uk  Tue Dec 29 11:57:18 2009
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Tue, 29 Dec 2009 16:57:18 +0000
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk>

On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
> 
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
> 
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
> 
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

Assuming, as you do, that the functionality of BioPerl and BioPython is
the same:  Which of the two programming languages are you (or your team)
most proficient in?  Use that language.

Regards,
Andreas

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom


From sdavis2 at mail.nih.gov  Tue Dec 29 12:03:40 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 12:03:40 -0500
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.

The two projects have similar goals, but saying that the functionality
is the same would be an extreme oversimplification.  You will need to
define what you want to do and then check to see what the two projects
have to offer.  This will, in general, require perusing the websites
for both projects as well as the relevant documentation.

> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.

Again, you will want to define the task(s) to be accomplished and then
weigh the pros and cons of each project combined with local expertise.
 If you don't know what you want to do, then you can certainly read
some examples on the websites and see which project strikes you as a
"winner" for you.

> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

These two statements are generalizations that provide little insight
into the strengths or weaknesses of the languages.  In other words,
one can write good or bad code in both languages.

Hope that helps.

Sean


From wenzhiwang1983 at yahoo.com.cn  Tue Dec 29 13:30:02 2009
From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi)
Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST)
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com>

Dear Jason,

Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO?

Thanks.

Wenzhi Wang
   State Key Laboratory of Genetic Resources and Evolution
   Kunming Institute of Zoology, Chinese Academy of Sciences
   Kunming, Yunnan 650223 P. R. China
   Tel:  86 871 5198 993
   Fax: 86 871 5195 430
   E-mail: wenzhiwang1983 at yahoo.com.cn


      ___________________________________________________________ 
  ?????????????????????????????????? 
http://card.mail.cn.yahoo.com/


From pengyu.ut at gmail.com  Tue Dec 29 13:58:59 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 12:58:59 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com>

To choose a toolkit for my work.

On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich <jason at bioperl.org> wrote:
> Are you asking for the purposes of choosing a toolkit for your work or just
> curious about the advantages/disadvantages of language choice?
>
> -jason
> On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:
>
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>>
>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From pengyu.ut at gmail.com  Tue Dec 29 14:15:14 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 13:15:14 -0600
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>
> The two projects have similar goals, but saying that the functionality
> is the same would be an extreme oversimplification. ?You will need to
> define what you want to do and then check to see what the two projects
> have to offer. ?This will, in general, require perusing the websites
> for both projects as well as the relevant documentation.

According to your experience, are there some tasks that are easier
with one than with another?

>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>
> Again, you will want to define the task(s) to be accomplished and then
> weigh the pros and cons of each project combined with local expertise.
> ?If you don't know what you want to do, then you can certainly read
> some examples on the websites and see which project strikes you as a
> "winner" for you.
>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>
> These two statements are generalizations that provide little insight
> into the strengths or weaknesses of the languages. ?In other words,
> one can write good or bad code in both languages.
>
> Hope that helps.
>
> Sean
>


From alperyilmaz at gmail.com  Tue Dec 29 14:36:03 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Tue, 29 Dec 2009 14:36:03 -0500
Subject: [Bioperl-l] Bio::TreeIO,
	Bio::Tree::Draw::Cladogram and phyloxml issues..
Message-ID: <dac81b0d0912291136x53edf2cjc6728e7062bd3bc1@mail.gmail.com>

Hello,

I have a tree in phyloxml format, and am trying to draw a subtree by
using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for
drawing and encountered some problems.

When I use whole tree and draw it, everything is fine; but, when I
pick a particular node and construct the subtree from that node's
ancestor by using "my $subtree = Bio::Tree::Tree->new(-root =>
$new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a
faulty EPS file, which contains extra lines added in the middle of the
file.
For instance:
.
.
.
72.0820393261372 126 moveto
(OsIBCD006509) show
30 81.25 moveto
 81.25 lineto
  lineto
48.5410196630686 120 moveto
30 120 lineto
.
.
.

Should read:

72.0820393261372 126 moveto
(OsIBCD006509) show
48.5410196630686 120 moveto
30 120 lineto


Also, I tried to write the subtree into a new phyloxml file first,
then draw it. The code is shown as follows:
my $savefile = "save.phyloxml";
my $treeout = Bio::TreeIO->new(-format =>'phyloxml',
                               -file => ">$savefile");
$treeout->write_tree($subtree);
my $tree2 = Bio::TreeIO->new(-format =>'phyloxml',
                                                 -file => "save.phyloxml");
my $t1 = $tree2->next_tree;
my $image_output = "test.eps";
my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree   => $t1,
                                                                  -top    => 10,
                                                                -bottom => 10,);
$obj1->print(-file => $image_output);

The generated phyloxml file, which is named save.phyloxml, has an
additional new line between "</phylogeny>" and "</phyloxml>" at the
end of the file. And this additional new line lead an error when doing
the parsing(open file and draw eps). I removed the new line, manually,
then Bio::Tree::Draw::Cladogram gave me the eps file successfully.

Anyone knows how to fix these problems:
1- faulty eps file generation
2- additional newline character in phyloxml output

Is it the problem about the way I create the subtree?

The phyloxml file I used can be downloaded from:
http://grassius.org/download/HSF.phyloxml

Run this code with the phyloxml file to see newline character problem:
http://pastebin.com/f87ee1ee

Run this code with the phyloxml file to see faulty eps file problem:
http://pastebin.com/fc4715a1

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954


From pengyu.ut at gmail.com  Tue Dec 29 16:32:17 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 15:32:17 -0600
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>

http://bioperl.org/Core/Latest/modules.html

Many links if not all are broken on the above pages. Could somebody fix it?

For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
I see the following error.

There is currently no text in this page. You can search for this page
title in other pages, search the related logs, or edit this page.


From jason at bioperl.org  Tue Dec 29 16:49:00 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:49:00 -0800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>

That is an outdated URL I am not sure where you are linking it from.  
We can probably now disable all old '/Core' URLs.

All documentation links are in the /wiki/

The beginner's howto is here for example
  http://bioperl.org/wiki/HOWTO:Beginners

> http://www.bioperl.org/wiki/HOWTOs


On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:

> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody  
> fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 16:50:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:50:26 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
References: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
Message-ID: <AA645194-F78E-4484-8952-02C40C1270F4@bioperl.org>

yep - be great if someone were to write it.  This being a volunteer  
project we welcome your contribution.  No I don't specifically have  
plans to do it, but maybe you can give it a try or another population  
genetics interested bioperl user/developer?

-jason
On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote:

> Dear Jason,
>
> Plink is a very useful program in the population genetics,  
> especially in the Genome-Wide SNP scan era. Is there any plan to add  
> the Plink (ped or tped) format to Bio::PopGen::IO?
>
> Thanks.
>
> Wenzhi Wang
>   State Key Laboratory of Genetic Resources and Evolution
>   Kunming Institute of Zoology, Chinese Academy of Sciences
>   Kunming, Yunnan 650223 P. R. China
>   Tel:  86 871 5198 993
>   Fax: 86 871 5195 430
>   E-mail: wenzhiwang1983 at yahoo.com.cn
>
>
>      ___________________________________________________________
>  ?????????????????
> http://card.mail.cn.yahoo.com/

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 16:57:49 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:57:49 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org>


On Dec 29, 2009, at 11:15 AM, Peng Yu wrote:

> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>  
> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com>  
>> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the  
>> functionality
>> is the same would be an extreme oversimplification.  You will need to
>> define what you want to do and then check to see what the two  
>> projects
>> have to offer.  This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?

As you have still failed to give much insight into the 'tasks' it is  
hard to give you a better answer.

If there is a module or set of routines already written then yes one  
might be easier than the other. Otherwise it just depends on your  
strengths in the programming language.
We discussed the strengths of the different toolkits briefly on the  
podcast last month.  http://twit.tv/floss96

I echo Sean. Use whichever language you are a better programmer in.   
BioPerl is more mature in some facets than is BioPython, but BioPython  
has some components that are more heavily developed and supported than  
BioPerl (structures being one of those and interfacing that to pyMol  
would be a strength).   I personally think the Gbrowse, Bio-Graphics,  
and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence  
databases and Features is a critical aspect of mining  genomic data  
and features and use these heavily in my work, making BioPerl easy and  
powerful for my tasks. That and sequence and alignment parsing and  
reformatting.  But there are comparable tools written in python with  
and without BioPython that you can also use so mainly it is about  
building up an expertise in a toolkit and going forward.  The BioPerl  
faithful will probably say it is more useful toolkit to us, but we are  
of course a biased sample.

Both projects can benefit from more users and developers contributing  
code and documentation so I would just jump in and give it a try if  
you are unsure which will be easier for you.

>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same  
>>> readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and  
>> then
>> weigh the pros and cons of each project combined with local  
>> expertise.
>>  If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages.  In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From pengyu.ut at gmail.com  Tue Dec 29 17:01:05 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:01:05 +1800
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and CDS boundary for 	a RefSeq ID?
Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>

I see the following example. But it is not clear to me how to get the
exon sequences. I also want to get the exon boundaries and associated
CDS boundaries. Although, I can get the boundary information from ucsc
table browser, but it would be convenient if I can get it in bioperl
along with the sequence.

Could somebody let me know how do it?

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html


From sdavis2 at mail.nih.gov  Tue Dec 29 17:13:30 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 17:13:30 -0500
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com>

On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.

It is unfortunate that the links are broken on that page.  However, I
believe that page is somewhat outdated, anyway.  Here are the HOWTO
pages:

http://www.bioperl.org/wiki/HOWTOs

Sean


From pengyu.ut at gmail.com  Tue Dec 29 17:21:16 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:21:16 +1800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
	<A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com>

On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich <jason at bioperl.org> wrote:
> That is an outdated URL I am not sure where you are linking it from. We can
> probably now disable all old '/Core' URLs.

I'm linked from here.

http://www.bioperl.org/wiki/BioPerl_Tutorial

Since those URLs are outdated. Could you please fix the links on the above link?

> All documentation links are in the /wiki/
>
> The beginner's howto is here for example
> ?http://bioperl.org/wiki/HOWTO:Beginners
>
>> http://www.bioperl.org/wiki/HOWTOs
>
>
> On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:
>
>> http://bioperl.org/Core/Latest/modules.html
>>
>> Many links if not all are broken on the above pages. Could somebody fix
>> it?
>>
>> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
>> I see the following error.
>>
>> There is currently no text in this page. You can search for this page
>> title in other pages, search the related logs, or edit this page.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From sdavis2 at mail.nih.gov  Tue Dec 29 18:06:17 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 18:06:17 -0500
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and 	CDS boundary for a RefSeq ID?
In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com>

On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> I see the following example. But it is not clear to me how to get the
> exon sequences. I also want to get the exon boundaries and associated
> CDS boundaries. Although, I can get the boundary information from ucsc
> table browser, but it would be convenient if I can get it in bioperl
> along with the sequence.
>
> Could somebody let me know how do it?
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html

Hi, Peng.  There may be some confusion, as the UCSC database aligns
RefSeq sequence to a genome to generate exon start and end
coordinates.  However, the RefSeq records retrieved by Bio::DB::RefSeq
are not in genomic context and so do not have start and end locations
on the genome.  That is, if you want the starts and ends along the
genome, that information is not available from the RefSeq record
itself, I don't think.  If that is what you need (genomic
coordinates), you can download the information directly from UCSC,
download flat files from NCBI mapview, or even from ensembl (using
biomart, for instance).  If you are looking for a bioperl-compliant
way of doing this, look at the Ensembl Perl API.

Sean


From jkhilmer at gmail.com  Tue Dec 29 14:55:18 2009
From: jkhilmer at gmail.com (Jonathan Hilmer)
Date: Tue, 29 Dec 2009 12:55:18 -0700
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>

Personally, I think that the differences between Python and Perl
(although substantial) are not large enough to make the language
itself the deciding factor.

Instead, consider the larger community of software.  I haven't yet
found a situation in which Python cannot be applied: it can be used
with R (statistics); lower-level code C or fortran; visualization
software such as PyMol, Chimera, Blender, VTK; plotting with
matplotlib; and scipy/numpy or sage, which provide innumerable
benefits for computation, data-processing, etc.

Although I don't claim to have a great deal of experience with Perl, I
haven't seen the same integration with that language: I'm assuming it
can be used with R and VTK (not sure about C or fortran?).  For this
reason, unless your work is highly targeted and you have no use
programming language integration with other software, I would
recommend Python.

For perl experts, I would truly appreciate any corrections you could
offer to these observations of mine, since I wouldn't mind using perl
if it offers benefits either in general or for specific applications.


Jonathan

On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the functionality
>> is the same would be an extreme oversimplification. ?You will need to
>> define what you want to do and then check to see what the two projects
>> have to offer. ?This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?
>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and then
>> weigh the pros and cons of each project combined with local expertise.
>> ?If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages. ?In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Biopython mailing list ?- ?Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From wgheath at gmail.com  Tue Dec 29 15:16:39 2009
From: wgheath at gmail.com (William Heath)
Date: Tue, 29 Dec 2009 12:16:39 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
	<81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
Message-ID: <f08ddf990912291216h32988b8cv20830c1b6701caf6@mail.gmail.com>

The biggest reason to go with python is the ease of use.  Biologists are not
programmers and the learning curve for python is much smaller than that of
perl.  I like perl but choose python because of this issue.  Perl 6 does
address some of these issues however but this has not been fully implemented
as of yet.

-Tim

P.S.

I love, love, love cpan though which is only for perl right now :(

On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer <jkhilmer at gmail.com>wrote:

> Personally, I think that the differences between Python and Perl
> (although substantial) are not large enough to make the language
> itself the deciding factor.
>
> Instead, consider the larger community of software.  I haven't yet
> found a situation in which Python cannot be applied: it can be used
> with R (statistics); lower-level code C or fortran; visualization
> software such as PyMol, Chimera, Blender, VTK; plotting with
> matplotlib; and scipy/numpy or sage, which provide innumerable
> benefits for computation, data-processing, etc.
>
> Although I don't claim to have a great deal of experience with Perl, I
> haven't seen the same integration with that language: I'm assuming it
> can be used with R and VTK (not sure about C or fortran?).  For this
> reason, unless your work is highly targeted and you have no use
> programming language integration with other software, I would
> recommend Python.
>
> For perl experts, I would truly appreciate any corrections you could
> offer to these observations of mine, since I wouldn't mind using perl
> if it offers benefits either in general or for specific applications.
>
>
> Jonathan
>
> On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> >>> May I ask somebody who are versitile in both bioperl and biopython
> >>> comment on the pros and cons of bioperl and biopython? I'm sending
> >>> this email to both bioperl and biopython mailing lists. But I hope
> >>> that it will not result in any contention.
> >>>
> >>> I assume that the functionality between bioperl or biopython is the
> >>> same, i.e., tasks can be done in bioperl can be done biopython and
> >>> vice versa, as both libraries have been out there over 10 years.
> >>> Please correct me if my understanding is not true.
> >>
> >> The two projects have similar goals, but saying that the functionality
> >> is the same would be an extreme oversimplification.  You will need to
> >> define what you want to do and then check to see what the two projects
> >> have to offer.  This will, in general, require perusing the websites
> >> for both projects as well as the relevant documentation.
> >
> > According to your experience, are there some tasks that are easier
> > with one than with another?
> >
> >>> Given that a task that can be done with either bioperl or biopython,
> >>> I, in particularly, want to know how long it will take to write the
> >>> code for the task in bioperl and biopython, with the same readability
> >>> requirement (see below) and the assumption that users have the same
> >>> fluency in perl and python.
> >>
> >> Again, you will want to define the task(s) to be accomplished and then
> >> weigh the pros and cons of each project combined with local expertise.
> >>  If you don't know what you want to do, then you can certainly read
> >> some examples on the websites and see which project strikes you as a
> >> "winner" for you.
> >>
> >>> python is claimed to be good for maintainability. But perl is
> >>> criticized for there-are-many-ways-for-a-given-task. Since there are
> >>> multiple ways in perl, let us assume that we always use perl in a
> >>> readable way.
> >>
> >> These two statements are generalizations that provide little insight
> >> into the strengths or weaknesses of the languages.  In other words,
> >> one can write good or bad code in both languages.
> >>
> >> Hope that helps.
> >>
> >> Sean
> >>
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From pengyu.ut at gmail.com  Wed Dec 30 12:26:45 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Thu, 31 Dec 2009 11:26:45 +1800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>

With Bio::SeqIO, I can only read in the records in a fasta file one by
one. This is preferable if there are many records in a file.

But I also want to read all the records in. I could use a while loop
to read all records in. But could somebody let me know if there is a
function in bioperl that can read in all the record at once and return
me an object?

http://www.bioperl.org/wiki/HOWTO:SeqIO


From sdavis2 at mail.nih.gov  Wed Dec 30 13:04:53 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 30 Dec 2009 13:04:53 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>

On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
>
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?

In perl, you can use an array to store the records.  You could also
use a hash if you have reasonable keys for the entries.

Sean


> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Wed Dec 30 14:58:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 Dec 2009 11:58:54 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org>

or use a database object so you can retrieve sequences that have a  
particular id. See Bio::DB::Fasta
On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:

> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>> by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and  
>> return
>> me an object?
>
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
>
> Sean
>
>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Wed Dec 30 16:20:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 30 Dec 2009 16:20:31 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife>

I think you might want Bio::AlignIO:

$alnio = Bio::AlignIO->new(-file=> 'my.fas' );
$aln = $alnio->next_aln;
@seqs = $aln->each_seqs;

MAJ
----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 30, 2009 12:26 PM
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?


> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From David.Messina at sbc.su.se  Thu Dec 31 05:55:32 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 31 Dec 2009 11:55:32 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
Message-ID: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave


From David.Messina at sbc.su.se  Tue Dec  1 10:14:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 1 Dec 2009 11:14:40 +0100
Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem
	to be parsed
In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk>
	<50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se>
	<8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk>
Message-ID: <ECCDC4FE-DF46-4CF8-806F-750837DED8AA@sbc.su.se>

Hi Mick,

Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file?

In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it?

Thanks,
Dave


On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote:

> Hi Dave
> 
> Just got round to looking at this.
> 
> In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something:
> 
> --------------------- WARNING ---------------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> ---------------------------------------------------
> 
> However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace:
> 
> ------------- EXCEPTION -------------
> MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta'
> STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347
> STACK toplevel parse2.pl:20
> -------------------------------------
> 
> I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module.
> 
> Is this another bug report?
> 
> Thanks again for all your help
> 
> Mick
> 
> -----Original Message-----
> From: Dave Messina [mailto:David.Messina at sbc.su.se] 
> Sent: 23 November 2009 17:46
> To: michael watson (IAH-C)
> Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed
> 
> Hi Mick,
> 
> Sure thing -- the current build from subversion is packaged up every  
> night and available here:
> http://www.bioperl.org/DIST/nightly_builds/
> 
> Just grab bioperl-live.tar.gz from there and you'll get the changes.
> 
> 
> Dave
> 
> 
> 
> 
> On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote:
> 
>> Hi Dave
>> 
>> Thanks for the hard work.
>> 
>> Trying to get the latest updates so I can use this... don't have svn  
>> on my server, tried to install it and I don't have python either,  
>> which is needed to install it.
>> 
>> I face about 3 weeks whilst my IT department sort this out, unless I  
>> can access the changes any other way?
>> 
>> Thanks
>> Mick
>> 
>> -----Original Message-----
>> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- 
>> daemon at portal.open-bio.org]
>> Sent: 20 November 2009 15:12
>> To: michael watson (IAH-C)
>> Subject: [Bug 2937] Strand in fasta35 output does not seem to be  
>> parsed
>> 
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2937
>> 
>> 
>> online at davemessina.com changed:
>> 
>>          What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>            Status|NEW                         |RESOLVED
>>        Resolution|                            |FIXED
>> 
>> 
>> 
>> 
>> ------- Comment #7 from online at davemessina.com  2009-11-20 10:12 EST  
>> -------
>> Fixed in r16394.
>> 
>> Michael, thanks for the report. Your test cases pass, but please  
>> reopen the bug
>> if needed.
>> 
>> 
>> -- 
>> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? 
>> tab=email
>> ------- You are receiving this mail because: -------
>> You reported the bug, or are watching the reporter.
> 


From e.osimo at gmail.com  Tue Dec  1 18:05:48 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Tue, 1 Dec 2009 19:05:48 +0100
Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test
Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com>

Hello everyone,
I'm trying to get the p value of a statistic made with Statistics::TTest
I cannot find this function: I can find if the null hypothesis is rejected
at a certain confidence level, but I cannot make the script show me the
actual p value.
Do you know other scripts that can do that?

Thanks
Emanuele


From cjfields at illinois.edu  Tue Dec  1 19:25:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 1 Dec 2009 13:25:03 -0600
Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov>
Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu>

I'll be adjusting the requisite parameters as indicated below.  I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it.

chris

Begin forwarded message:

> From: <utilities-announce at ncbi.nlm.nih.gov>
> Date: December 1, 2009 12:59:34 PM CST
> To: <utilities-announce at ncbi.nlm.nih.gov>
> Subject: [Utilities-announce] NCBI E-Utility Policy Change
> Reply-To: utilities-announce at ncbi.nlm.nih.gov
> 
> As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests.
>  
> The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request.
>  
> The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request.
>  
> NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities.
>  
> NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov.
>  
> Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service.
>  
> _______________________________________________
> Utilities-announce mailing list
> http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce


From maj at fortinbras.us  Wed Dec  2 02:27:06 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 21:27:06 -0500
Subject: [Bioperl-l] test test test
Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife>

MAJ


From ocarnorsk138 at gmail.com  Wed Dec  2 02:59:48 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Tue, 1 Dec 2009 23:59:48 -0300
Subject: [Bioperl-l] test test test
In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
Message-ID: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>

test test test test back


O'car Campos C.
Bioinformatics Engineering Student.
University of Talca.
Chile.


2009/12/1 Mark A. Jensen <maj at fortinbras.us>

> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Wed Dec  2 03:08:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 1 Dec 2009 22:08:23 -0500
Subject: [Bioperl-l] test test test
In-Reply-To: <b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
References: <95142B0024EC48928CB56A69A17A8559@NewLife>
	<b099d0430912011859h7f225b4em8ca92aa3129e5e38@mail.gmail.com>
Message-ID: <CC7F9A12F9474D2BB5DC4E69190F2AE6@NewLife>

I love when people are paying attention!
  ----- Original Message ----- 
  From: Ocar Campos 
  To: Mark A. Jensen ; Bioperl Mailing List. 
  Sent: Tuesday, December 01, 2009 9:59 PM
  Subject: Re: [Bioperl-l] test test test


  test test test test back


  O'car Campos C.
  Bioinformatics Engineering Student.
  University of Talca.
  Chile.


  2009/12/1 Mark A. Jensen <maj at fortinbras.us>

    MAJ
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rtbio.2009 at gmail.com  Wed Dec  2 12:07:08 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Wed, 2 Dec 2009 13:07:08 +0100
Subject: [Bioperl-l] Remote blast
Message-ID: <c7cac1600912020407j176c83edm9f5a3d151f507bd2@mail.gmail.com>

Hello everyone,

I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a
cgi script was written which connects to NCBI blast using remote blast
program,i.e.,

The input sequence given in the html page is taken as input and Remote blast
is performed on this based on the code for Remote blast.But,I have a problem
in the Remote blast code.

My code goes like this

@compseqs=blastcode($in{'Inputseq'});

sub blastcode
{
$input1= $_[0];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
brucei[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


 while (my $input = $str->next_seq())

{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

  print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
          my $filename = $result->query_name()."\.out";
           $factory->save_output($filename);
          $factory->remove_rid($rid);
         #       open(BLASTDEBUGFILE,'>',$blastdebugfile);
  #     print BLASTDEBUGFILE "Test1  $result";
   #     close(BLASTDEBUGFILE);

     open(OUTFILE,'>',$outfile);
     print OUTFILE "Test2 $result->database_name()";
     close(OUTFILE);

    while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);

              # open(OUTFILE,'>',$outfile);
              # print OUTFILE "in while hits";
              #close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
}
# open(OUTFILE,'>',$outfile);
  #print OUTFILE $seqs[0];
 # close(OUTFILE);

return(@seqs);
}

Here in the above code,my program is able to go till the 'else' part and
writing the output file i.e.,this step.
my $filename = $result->query_name()."\.out";

But when I tried to enter in to the next while loop where I can get the
hits,the program is not entering into the while loop i.e.,

Not entering into this
while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);


Hence I am unable to get any hits for my query.
Ex:-If the query's accession number is Tb11.02.2210, I could just get a file
Tb11.02.2210.out file,it is just displaying the file name on the browser.

Please help me in solving this problem and mail me regarding any confusions.

Regards,
Roopa.


From ashvip at gmail.com  Wed Dec  2 05:24:09 2009
From: ashvip at gmail.com (Vipin Singh)
Date: Wed, 2 Dec 2009 10:54:09 +0530
Subject: [Bioperl-l] Problems with installation
Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>

Dear Sir/Madam,
I have not been able to install bioperl on my Windows 32 machine despite
repeated attempts. I have tried both Active Perl and Strwaberry perl but
both do not seem to work.
I have followed the instruction given at
-- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Please guide.
Thanks,
Vipin.
Vipin Singh,
Senior Research Fellow,
Centre for Cellular and Molecular Biology,
Hyderabad - 500007
India.
contact - 91-040-27192778


From scott at scottcain.net  Wed Dec  2 14:18:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 2 Dec 2009 09:18:37 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com>

Hello Vipin,

"do not seem to work" doesn't give us much to go on; can you tell us
what happened?

Scott


On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh <ashvip at gmail.com> wrote:
> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From maj at fortinbras.us  Wed Dec  2 14:18:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 09:18:31 -0500
Subject: [Bioperl-l] Problems with installation
In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com>
Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife>

Hi Vipin--
We need some more information; your commands, error messages you received.
Thanks, 
Mark
----- Original Message ----- 
From: "Vipin Singh" <ashvip at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 12:24 AM
Subject: [Bioperl-l] Problems with installation


> Dear Sir/Madam,
> I have not been able to install bioperl on my Windows 32 machine despite
> repeated attempts. I have tried both Active Perl and Strwaberry perl but
> both do not seem to work.
> I have followed the instruction given at
> -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> Please guide.
> Thanks,
> Vipin.
> Vipin Singh,
> Senior Research Fellow,
> Centre for Cellular and Molecular Biology,
> Hyderabad - 500007
> India.
> contact - 91-040-27192778
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From bcantarel at som.umaryland.edu  Wed Dec  2 18:36:27 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 13:36:27 -0500
Subject: [Bioperl-l] Parsing Genbank
Message-ID: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>

Hi all,
I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.

x $cds->start
1
x $cds->end
64

How can I get the original coordinates?  Is there a command for that or will I have to just do the math?

Feature or Bug?


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore


From maj at fortinbras.us  Wed Dec  2 19:09:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:09:11 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
Message-ID: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>

Hi Brandi-
If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
ordinary Bio::Seq, that's normal.
Can you elaborate by posting your code?
cheers,
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 1:36 PM
Subject: [Bioperl-l] Parsing Genbank


> Hi all,
> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
> it changes the coordinates of things on the minus strand.
>
>
> For example, I have a sequence that has a CDS on the minus strand at it is 
> from 911 to 974.  The sequence is 974 nt.
>
> x $cds->start
> 1
> x $cds->end
> 64
>
> How can I get the original coordinates?  Is there a command for that or will I 
> have to just do the math?
>
> Feature or Bug?
>
>
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bcantarel at som.umaryland.edu  Wed Dec  2 19:29:56 2009
From: bcantarel at som.umaryland.edu (Brandi Cantarel)
Date: Wed, 2 Dec 2009 14:29:56 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>

Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
		       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
	next F1 unless ($cds->primary_tag() eq 'CDS');
	#do something with the cds start and cds end
	}
}
	 

LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
> 
> 
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>> 
>> 
>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>> 
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>> 
>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>> 
>> Feature or Bug?
>> 
>> 
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From maj at fortinbras.us  Wed Dec  2 19:48:44 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 14:48:44 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife>

with fake seq data and that header, I don't get a problem:

  DB<2> x $cds->location
0  Bio::Location::Simple=HASH(0x37b1df4)
   '_end' => 974
   '_location_type' => 'EXACT'
   '_root_verbose' => 0
   '_seqid' => 'subjpool12_contig3'
   '_start' => 911
   '_strand' => '-1'

Are you using the latest BioPerl (1.6.1 or the trunk) ?
MAJ
----- Original Message ----- 
From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 02, 2009 2:29 PM
Subject: Re: [Bioperl-l] Parsing Genbank


Here is some of my code, the real code actually enters the data into a database.


$in  = Bio::SeqIO->new(-file => $gbkfile,
       '-format' => 'genbank');

W1:while (my $seq = $in->next_seq()) {
  my @feats = $seq->get_all_SeqFeatures();
  my $j = 0;
 F1:foreach $cds (@feats) {
next F1 unless ($cds->primary_tag() eq 'CDS');
###>> debugger stops here for above output

#do something with the cds start and cds end
}
}


LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 
19-Nov-2009
ACCESSION   subjpool12_contig3
KEYWORDS    .
SOURCE      human metagenome
  ORGANISM  human metagenome
            unclassified sequences; organismal metagenomes,metagenomes.
FEATURES             Location/Qualifiers
     source          1..974
                     /mol_type="genomic DNA"
                     /isolation_source="Homo sapiens"
                     /organism="human metagenome"
                     /collection_date="19-Nov-2009"
     CDS             complement(911..974)
                     /locus_tag="subjpool12_contig3|metagene|gene_2"
                     /translation="IRIMTVELINPYIRHVEHST"
                     /score="2.52804"
                     /product="hypothetical protein"
                     /note="score=2.52804"
                     /note="score=2.52804"
                     /note="frame=1"
ORIGIN
#some sequence?.


>From this example, I would like to get the coordinates 911 and 974, rather than 
>1 and 64.


~~~~~~~~~~~~~~~~~~~~
Brandi Cantarel, PhD
Bioinformatics Analyst
Institute for Genome Sciences
School of Medicine
University of Maryland, Baltimore

On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:

> Hi Brandi-
> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an 
> ordinary Bio::Seq, that's normal.
> Can you elaborate by posting your code?
> cheers,
> MAJ
> ----- Original Message ----- From: "Brandi Cantarel" 
> <bcantarel at som.umaryland.edu>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 02, 2009 1:36 PM
> Subject: [Bioperl-l] Parsing Genbank
>
>
>> Hi all,
>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, 
>> it changes the coordinates of things on the minus strand.
>>
>>
>> For example, I have a sequence that has a CDS on the minus strand at it is 
>> from 911 to 974.  The sequence is 974 nt.
>>
>> x $cds->start
>> 1
>> x $cds->end
>> 64
>>
>> How can I get the original coordinates?  Is there a command for that or will 
>> I have to just do the math?
>>
>> Feature or Bug?
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~
>> Brandi Cantarel, PhD
>> Bioinformatics Analyst
>> Institute for Genome Sciences
>> School of Medicine
>> University of Maryland, Baltimore
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 19:39:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 13:39:40 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu>
	<E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu>

That one's odd; the coordinates should relate back to the original sequence.  Any chance you could pass on the sequence file so we can confirm it?  you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem).

chris

On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote:

> Here is some of my code, the real code actually enters the data into a database.
> 
> 
> $in  = Bio::SeqIO->new(-file => $gbkfile,
> 		       '-format' => 'genbank');
> 
> W1:while (my $seq = $in->next_seq()) {
>  my @feats = $seq->get_all_SeqFeatures();
>  my $j = 0;
> F1:foreach $cds (@feats) {
> 	next F1 unless ($cds->primary_tag() eq 'CDS');
> 	#do something with the cds start and cds end
> 	}
> }
> 	 
> 
> LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
> ACCESSION   subjpool12_contig3
> KEYWORDS    .
> SOURCE      human metagenome
>  ORGANISM  human metagenome
>            unclassified sequences; organismal metagenomes,metagenomes.
> FEATURES             Location/Qualifiers
>     source          1..974
>                     /mol_type="genomic DNA"
>                     /isolation_source="Homo sapiens"
>                     /organism="human metagenome"
>                     /collection_date="19-Nov-2009"
>     CDS             complement(911..974)
>                     /locus_tag="subjpool12_contig3|metagene|gene_2"
>                     /translation="IRIMTVELINPYIRHVEHST"
>                     /score="2.52804"
>                     /product="hypothetical protein"
>                     /note="score=2.52804"
>                     /note="score=2.52804"
>                     /note="frame=1"
> ORIGIN
> #some sequence?.
> 
> 
> 
> 
>> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
> 
> On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
>> Hi Brandi-
>> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
>> Can you elaborate by posting your code?
>> cheers,
>> MAJ
>> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, December 02, 2009 1:36 PM
>> Subject: [Bioperl-l] Parsing Genbank
>> 
>> 
>>> Hi all,
>>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>>> 
>>> 
>>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>>> 
>>> x $cds->start
>>> 1
>>> x $cds->end
>>> 64
>>> 
>>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>>> 
>>> Feature or Bug?
>>> 
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~
>>> Brandi Cantarel, PhD
>>> Bioinformatics Analyst
>>> Institute for Genome Sciences
>>> School of Medicine
>>> University of Maryland, Baltimore
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Dec  2 20:52:28 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 2 Dec 2009 15:52:28 -0500
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife>

Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
as if there is a bug. If you can provide data that can reproduce
it, as Chris suggests, we can get onto it. 
thanks MAJ
  ----- Original Message ----- 
  From: Brandi Cantarel 
  To: Mark A. Jensen 
  Sent: Wednesday, December 02, 2009 3:38 PM
  Subject: Re: [Bioperl-l] Parsing Genbank


  How can I tell what version I am using?When I use the command from the website:


  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'


  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.


  Brandi


  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:


    with fake seq data and that header, I don't get a problem:

    DB<2> x $cds->location
    0  Bio::Location::Simple=HASH(0x37b1df4)
     '_end' => 974
     '_location_type' => 'EXACT'
     '_root_verbose' => 0
     '_seqid' => 'subjpool12_contig3'
     '_start' => 911
     '_strand' => '-1'

    Are you using the latest BioPerl (1.6.1 or the trunk) ?
    MAJ
    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
    Cc: <bioperl-l at lists.open-bio.org>
    Sent: Wednesday, December 02, 2009 2:29 PM
    Subject: Re: [Bioperl-l] Parsing Genbank


    Here is some of my code, the real code actually enters the data into a database.


    $in  = Bio::SeqIO->new(-file => $gbkfile,
         '-format' => 'genbank');

    W1:while (my $seq = $in->next_seq()) {
    my @feats = $seq->get_all_SeqFeatures();
    my $j = 0;
    F1:foreach $cds (@feats) {
    next F1 unless ($cds->primary_tag() eq 'CDS');
    ###>> debugger stops here for above output

    #do something with the cds start and cds end
    }
    }


    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
    ACCESSION   subjpool12_contig3
    KEYWORDS    .
    SOURCE      human metagenome
    ORGANISM  human metagenome
              unclassified sequences; organismal metagenomes,metagenomes.
    FEATURES             Location/Qualifiers
       source          1..974
                       /mol_type="genomic DNA"
                       /isolation_source="Homo sapiens"
                       /organism="human metagenome"
                       /collection_date="19-Nov-2009"
       CDS             complement(911..974)
                       /locus_tag="subjpool12_contig3|metagene|gene_2"
                       /translation="IRIMTVELINPYIRHVEHST"
                       /score="2.52804"
                       /product="hypothetical protein"
                       /note="score=2.52804"
                       /note="score=2.52804"
                       /note="frame=1"
    ORIGIN
    #some sequence?.


      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.


    ~~~~~~~~~~~~~~~~~~~~
    Brandi Cantarel, PhD
    Bioinformatics Analyst
    Institute for Genome Sciences
    School of Medicine
    University of Maryland, Baltimore

    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:


      Hi Brandi-

      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.

      Can you elaborate by posting your code?

      cheers,

      MAJ

      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>

      To: <bioperl-l at lists.open-bio.org>

      Sent: Wednesday, December 02, 2009 1:36 PM

      Subject: [Bioperl-l] Parsing Genbank


        Hi all,

        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.


        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.


        x $cds->start

        1

        x $cds->end

        64


        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?


        Feature or Bug?


        ~~~~~~~~~~~~~~~~~~~~

        Brandi Cantarel, PhD

        Bioinformatics Analyst

        Institute for Genome Sciences

        School of Medicine

        University of Maryland, Baltimore


        _______________________________________________

        Bioperl-l mailing list

        Bioperl-l at lists.open-bio.org

        http://lists.open-bio.org/mailman/listinfo/bioperl-l


    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Dec  2 21:07:58 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 2 Dec 2009 15:07:58 -0600
Subject: [Bioperl-l] Parsing Genbank
In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife>
References: <B261E992-759C-44A7-9E75-8BA9709E1B0B@som.umaryland.edu><E6D24EA23E2C45208F98B98D0A6D4F7F@NewLife>
	<854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu>
	<24B3D1A1667D44338CDE5A4FFE425C56@NewLife>
	<001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu>
	<07332179362A4D53ACAA9A72AD208049@NewLife>
Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu>

One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). 

Not much we can do unless we have something to help confirm the problem.  Also might help to know the source of the genbank file itself.

chris

On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote:

> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
> as if there is a bug. If you can provide data that can reproduce
> it, as Chris suggests, we can get onto it. 
> thanks MAJ
>  ----- Original Message ----- 
>  From: Brandi Cantarel 
>  To: Mark A. Jensen 
>  Sent: Wednesday, December 02, 2009 3:38 PM
>  Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>  How can I tell what version I am using?When I use the command from the website:
> 
> 
>  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 
> 
>  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?.
> 
> 
>  Brandi
> 
> 
> 
> 
>  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:
> 
> 
>    with fake seq data and that header, I don't get a problem:
> 
>    DB<2> x $cds->location
>    0  Bio::Location::Simple=HASH(0x37b1df4)
>     '_end' => 974
>     '_location_type' => 'EXACT'
>     '_root_verbose' => 0
>     '_seqid' => 'subjpool12_contig3'
>     '_start' => 911
>     '_strand' => '-1'
> 
>    Are you using the latest BioPerl (1.6.1 or the trunk) ?
>    MAJ
>    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>    Cc: <bioperl-l at lists.open-bio.org>
>    Sent: Wednesday, December 02, 2009 2:29 PM
>    Subject: Re: [Bioperl-l] Parsing Genbank
> 
> 
>    Here is some of my code, the real code actually enters the data into a database.
> 
> 
>    $in  = Bio::SeqIO->new(-file => $gbkfile,
>         '-format' => 'genbank');
> 
>    W1:while (my $seq = $in->next_seq()) {
>    my @feats = $seq->get_all_SeqFeatures();
>    my $j = 0;
>    F1:foreach $cds (@feats) {
>    next F1 unless ($cds->primary_tag() eq 'CDS');
>    ###>> debugger stops here for above output
> 
>    #do something with the cds start and cds end
>    }
>    }
> 
> 
>    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
>    ACCESSION   subjpool12_contig3
>    KEYWORDS    .
>    SOURCE      human metagenome
>    ORGANISM  human metagenome
>              unclassified sequences; organismal metagenomes,metagenomes.
>    FEATURES             Location/Qualifiers
>       source          1..974
>                       /mol_type="genomic DNA"
>                       /isolation_source="Homo sapiens"
>                       /organism="human metagenome"
>                       /collection_date="19-Nov-2009"
>       CDS             complement(911..974)
>                       /locus_tag="subjpool12_contig3|metagene|gene_2"
>                       /translation="IRIMTVELINPYIRHVEHST"
>                       /score="2.52804"
>                       /product="hypothetical protein"
>                       /note="score=2.52804"
>                       /note="score=2.52804"
>                       /note="frame=1"
>    ORIGIN
>    #some sequence?.
> 
> 
> 
> 
> 
>      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> 
>    ~~~~~~~~~~~~~~~~~~~~
>    Brandi Cantarel, PhD
>    Bioinformatics Analyst
>    Institute for Genome Sciences
>    School of Medicine
>    University of Maryland, Baltimore
> 
>    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
> 
>      Hi Brandi-
> 
>      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
> 
>      Can you elaborate by posting your code?
> 
>      cheers,
> 
>      MAJ
> 
>      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
> 
>      To: <bioperl-l at lists.open-bio.org>
> 
>      Sent: Wednesday, December 02, 2009 1:36 PM
> 
>      Subject: [Bioperl-l] Parsing Genbank
> 
> 
> 
> 
> 
>        Hi all,
> 
>        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
> 
> 
> 
> 
> 
>        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
> 
> 
> 
>        x $cds->start
> 
>        1
> 
>        x $cds->end
> 
>        64
> 
> 
> 
>        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
> 
> 
> 
>        Feature or Bug?
> 
> 
> 
> 
> 
>        ~~~~~~~~~~~~~~~~~~~~
> 
>        Brandi Cantarel, PhD
> 
>        Bioinformatics Analyst
> 
>        Institute for Genome Sciences
> 
>        School of Medicine
> 
>        University of Maryland, Baltimore
> 
> 
> 
> 
> 
>        _______________________________________________
> 
>        Bioperl-l mailing list
> 
>        Bioperl-l at lists.open-bio.org
> 
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
>    _______________________________________________
>    Bioperl-l mailing list
>    Bioperl-l at lists.open-bio.org
>    http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Thu Dec  3 10:31:31 2009
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 3 Dec 2009 05:31:31 -0500
Subject: [Bioperl-l] modENCODE seeking data managers
Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com>

Hi All,

My apologies for spamming the list, but this announcement may be of
interest:


The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA
Elements; www.modencode.org) is seeking data managers to gather and curate
large scale functional genomics data sets in fly and worm. For details, see
http://blog.modencode.org/?p=350.


Lincoln

-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From dan.bolser at gmail.com  Thu Dec  3 11:44:40 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 11:44:40 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ?
Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>

Hi, can someone test the script here on zero length fasta / qual files?

http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ


It seems the output has an extra newline in the sequence part of the
output (which throws off scripts that rely on the 'four lines per
record' structure of the fastq (although I'm not sure if it's illegal
fastq).

Here is what I see

BEGIN
$ head one.fna
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ head one.qual
>FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2
$ createFastq.plx one.fna one.qual
@FVF7ZWH02PFOVG


+FVF7ZWH02PFOVG

END


Currently I just put in a clause in the script to skip any zero length
sequences, but I think the Qual shouldn't output an extra newline like
this.


Cheers,
Dan.


--

JHB: Bioinformatics is Biology and Biology is Bioinformatics.


From biopython at maubp.freeserve.co.uk  Thu Dec  3 12:12:15 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 12:12:15 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>

On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi, can someone test the script here on zero length fasta / qual files?
>
> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>
> It seems the output has an extra newline in the sequence part of the
> output (which throws off scripts that rely on the 'four lines per
> record' structure of the fastq (although I'm not sure if it's illegal
> fastq).

Hi Dan,

The OBF consensus was FASTQ records with a zero length
sequence might be useful, and should be output as exactly
four lines (one blank sequence line, one blank quality line).
However for parsing, any number of blank lines should be OK.
http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html

I can confirm the perl script currently outputs a FASTQ file
with TWO blank lines for the sequence, giving five lines in
total for the zero length record. That does suggest a bug.
What version of BioPerl are you running?

Peter

P.S. The script is throwing away any description after the
identifier.


From dan.bolser at gmail.com  Thu Dec  3 13:07:27 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 13:07:27 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>> Hi, can someone test the script here on zero length fasta / qual files?
>>
>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>
>> It seems the output has an extra newline in the sequence part of the
>> output (which throws off scripts that rely on the 'four lines per
>> record' structure of the fastq (although I'm not sure if it's illegal
>> fastq).
>
> Hi Dan,
>
> The OBF consensus was FASTQ records with a zero length
> sequence might be useful, and should be output as exactly
> four lines (one blank sequence line, one blank quality line).
> However for parsing, any number of blank lines should be OK.
> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>
> I can confirm the perl script currently outputs a FASTQ file
> with TWO blank lines for the sequence, giving five lines in
> total for the zero length record. That does suggest a bug.
> What version of BioPerl are you running?

Hi Peter,

Basically, I'm not running the 'latest' version of BP, which is why I
asked this question of the list rather than filing a bug report. What
version are you running? ;-)

Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
for the info).


> Peter
>
> P.S. The script is throwing away any description after the
> identifier.

That's probably bad. Feel free to edit the script on the wiki. Sadly,
MediaWiki's diff features are less than optimal, so developing scripts
on the wiki isn't ideal. Anyone know how to plug git-hub into a script
apparently hosted on a wiki?

Or is git-hub basically designed to be 'wiki for code'?

I'm wondering, because with the FlaggedRevs extension you could
basically build a whole release in the wiki. Which would be fun if
nothing else!


-- 

JHP: Biology is bioinformatics and bioinformatics is biology.


From heyne at informatik.uni-freiburg.de  Thu Dec  3 13:19:51 2009
From: heyne at informatik.uni-freiburg.de (Steffen Heyne)
Date: Thu, 03 Dec 2009 14:19:51 +0100
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>
	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de>

Hello,

so I tried to fix the problem with the location. Currently it works for
me with the following changes:

LocatableSeq.pm

sub get_nse{

...

	my $ret;
	if ($self->strand() >= 0) {
		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
	} else {
		$ret = $id . $v. $char1 . $end . $char2 . $st ;
	}
	return $ret;
}

Then I recognized during the usage of $aln->remove_seq() that it cannot
remove a seq as it uses a wrong NSE to lookup sequences. I changed the
following:

SimpleAlign.pm

sub remove_seq {

...
	$id = $seq->id();
    	$start = $seq->start();
    	$end  = $seq->end();

## changed code:

	my $v = $seq->version ? '.'.$seq->version : '';
    	if ($seq->strand >=0){
		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
	} elsif ($seq->strand == -1){
		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
	}	
...

}

The above code in LocatableSeq.pm worked in the case if I read an
alignment in stockholm format and write it out in clustalw format. But
if I read an alignment in clustalw and write it out as stockholm (or
something else) it didn't worked, as the strand is not correctly set in
ClustalW::next_aln. It works with the following changes:

ClustalW.pm

sub next_aln{

...

	my ( $sname, $start, $end, $strand );	## strand added
	$strand = 0;				## new, standard = 0???
    	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
%alignments ) {
        if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
        	( $sname, $start, $end ) = ( $1, $2, $3 );
		$strand = 1;			## new			
		if ($start > $end) {		## new
       		($start, $end, $strand) = ($end, $start, -1); ##new
		}				## new
	
      }
        else {
            ( $sname, $start ) = ( $name, 1 );
            my $str = $alignments{$name};
            $str =~ s/[^A-Za-z]//g;
            $end = length($str);
        }

        my $seq = Bio::LocatableSeq->new(
            -seq   => $alignments{$name},
            -id    => $sname,
            -start => $start,
            -end   => $end,
	    -strand=> $strand			## new
        );

...

}

So I don't know if I changed things at their correct position. And I
found them only because I used certain functions. I dont know how broad
the effect of a changed NSE in LocatableSeq.pm is to other Modules and
functions. But I'm happy with my changes (so far :-)...).

Do you will change this to your proposed way in bioperl trunk?

Thanks!

steffen


Chris Fields schrieb:
> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
> 
>> Hi,
>>
>> I'm using Bioperl for my research and it is very useful! Thank you!
>>
>> Currently I have a problem with locations tags of sequences. I read in
>> seed alignments of Rfam (in stockholm format, but I think it is
>> similar to other formats).
>>
>> If the location is like:
>>
>> AB194432.1/908-846
>>
>> the start/end values are changed to
>>
>> $seq->start = 846
>> $seq->end = 908
>>
>> and therefore the new location (e.g.$seq->get_nse) is:
>>
>> AB194432.1/846-908
>>
>> The $seq->strand tag is correctly set to -1 in this case, but if the
>> alignment is written out again (clustal, stockholm,...) this strand
>> info is lost and the sequences have this "wrong" location. But this
>> information is important in respect to the sequence accession number.
>>
>> Is there a way to set the location back to the original one or is this
>> behavior desired? Any manually setting with $seq->start($val) failed
>> due to automatic checking.
>>
>> I'm using bioperl 1.6.1
>>
>> Thanks!
>>
>> steffen
> 
> This is a definite bug. We recently discussed amending the NSE format
> due to this (the subject came up over the last few months or so); it's
> fallen through the cracks.  Fortunaely it is very easy to fix (the
> relevant method is in LocatableSeq).
> 
> Does anyone have a problem with me adding this in?  It will change
> output for only those instances where the strand is -1, so
> 
> AB194432.1/908-846
> 
> would be start = 846, end = 908, strand = -1
> 
> AB194432.1/846-908
> 
> would be start = 846, end = 908, strand = 1
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
---
Steffen Heyne, Dipl.-Bioinf.
Lehrstuhl f?r Bioinformatik
Institut f?r Informatik
Albert-Ludwigs-Universit?t Freiburg
Georges-K?hler-Allee 106
79110 Freiburg, Germany

Tel: (+49) 761 203 7465
Fax: (+49) 761 203 7462
Mail: heyne at informatik.uni-freiburg.de


From cjfields at illinois.edu  Thu Dec  3 13:47:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 07:47:32 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>

Dan,

On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

> 2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
>> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>>> Hi, can someone test the script here on zero length fasta / qual files?
>>> 
>>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ
>>> 
>>> It seems the output has an extra newline in the sequence part of the
>>> output (which throws off scripts that rely on the 'four lines per
>>> record' structure of the fastq (although I'm not sure if it's illegal
>>> fastq).
>> 
>> Hi Dan,
>> 
>> The OBF consensus was FASTQ records with a zero length
>> sequence might be useful, and should be output as exactly
>> four lines (one blank sequence line, one blank quality line).
>> However for parsing, any number of blank lines should be OK.
>> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html
>> 
>> I can confirm the perl script currently outputs a FASTQ file
>> with TWO blank lines for the sequence, giving five lines in
>> total for the zero length record. That does suggest a bug.
>> What version of BioPerl are you running?
> 
> Hi Peter,
> 
> Basically, I'm not running the 'latest' version of BP, which is why I
> asked this question of the list rather than filing a bug report. What
> version are you running? ;-)
> 
> Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks
> for the info).

FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN).  Basically, it now parses all three FASTQ variants.  However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1.  Peter can you confirm that?

>> Peter
>> 
>> P.S. The script is throwing away any description after the
>> identifier.
> 
> That's probably bad. Feel free to edit the script on the wiki. Sadly,
> MediaWiki's diff features are less than optimal, so developing scripts
> on the wiki isn't ideal. Anyone know how to plug git-hub into a script
> apparently hosted on a wiki?
> 
> Or is git-hub basically designed to be 'wiki for code'?

It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc.  Think Soourceforge, but a lot nicer and with no ads ;>

BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric).

> I'm wondering, because with the FlaggedRevs extension you could
> basically build a whole release in the wiki. Which would be fun if
> nothing else!

I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (

chris


From biopython at maubp.freeserve.co.uk  Thu Dec  3 14:20:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:20:32 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>

On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> FASTQ parsing had undergone a major revision prior to
> 1.6.1 (the latest release in CPAN). ?Basically, it now parses
> all three FASTQ variants. ?However, Peter indicates there
> may still be a problem, and it's likely he's running 1.6.1.
> Peter can you confirm that?

I had BioPerl from SVN circa 1.6.1 (not sure if this was before
or after the release of 1.6.1 now):

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069
$ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
1.0069

If the tuples mean anything to you:

$ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
49.46.48.48.54.57
$ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
49.46.48.48.54.57

I just updated to revision 16435, and retested. I get the same
BioPerl version numbers, and the same extra blank line in the
sequence FASTQ output as Dan reported.

Peter


From cjfields at illinois.edu  Thu Dec  3 14:39:35 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 08:39:35 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
Message-ID: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>

On Dec 3, 2009, at 8:20 AM, Peter wrote:

> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> 
>> FASTQ parsing had undergone a major revision prior to
>> 1.6.1 (the latest release in CPAN).  Basically, it now parses
>> all three FASTQ variants.  However, Peter indicates there
>> may still be a problem, and it's likely he's running 1.6.1.
>> Peter can you confirm that?
> 
> I had BioPerl from SVN circa 1.6.1 (not sure if this was before
> or after the release of 1.6.1 now):
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"'
> 1.0069
> 
> If the tuples mean anything to you:
> 
> $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'
> 49.46.48.48.54.57
> $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
> 49.46.48.48.54.57
> 
> I just updated to revision 16435, and retested. I get the same
> BioPerl version numbers, and the same extra blank line in the
> sequence FASTQ output as Dan reported.
> 
> Peter

Okay I will try to look into it today (it should be an easy fix).  There are two issues, correct?

1) extra blank line.
2) missing description

Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)?  Otherwise it might get lost on the mail list or wiki.

chris


From biopython at maubp.freeserve.co.uk  Thu Dec  3 14:56:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:56:39 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>

On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?
>
> 1) extra blank line.

Which seems to be a bug in BioPerl SeqIO itself.

> 2) missing description

This is just a trivial bug/omission in the wiki example,
http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ

You just need to replace this:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

With:

  my $bsq_obj =
    Bio::Seq::Quality->
	new( -id   => $seq_obj->id,
	     -description => $seq_obj->description,
             -seq  => $seq_obj->seq,
	     -qual => $qual_obj->qual,
	   );

Look - I seem to be learning Perl by osmosis ;)

Peter


From dan.bolser at gmail.com  Thu Dec  3 16:29:11 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:29:11 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com>
	<DF4F7E8A-A4D4-4B59-9445-23620378DB22@illinois.edu>
	<320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com>
Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com>

2009/12/3 Peter <biopython at maubp.freeserve.co.uk>:
> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct?

...

>> 2) missing description
>
> This is just a trivial bug/omission in the wiki example,

...

> Look - I seem to be learning Perl by osmosis ;)

Yay!


From dan.bolser at gmail.com  Thu Dec  3 16:30:44 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Thu, 3 Dec 2009 16:30:44 +0000
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>

2009/12/3 Chris Fields <cjfields at illinois.edu>:
> Dan,
>
> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:

...

>> I'm wondering, because with the FlaggedRevs extension you could
>> basically build a whole release in the wiki. Which would be fun if
>> nothing else!
>
> I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see (

I never said it would be beneficial, only that it would be fun.

http://www.mediawiki.org/wiki/Flaggedrevs


From florent.angly at gmail.com  Thu Dec  3 18:26:57 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Thu, 03 Dec 2009 10:26:57 -0800
Subject: [Bioperl-l] problem with alignments and sequence locations
In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de>
References: <4AF962AA.7060908@informatik.uni-freiburg.de>	<DF72C01A-410F-4391-B33E-4884D7CB859E@illinois.edu>
	<4B17BAF7.2050604@informatik.uni-freiburg.de>
Message-ID: <4B1802F1.1040304@gmail.com>

Hi all,

Like Steffen, I've had a few burning questions too regarding 
LocatableSeq lately.

I've had an occasional issue with LocatableSeq. Most assembly-related 
modules use LocatableSeq objects. They specify the sequence start but 
not the sequence end. This works in most cases, but I've recently 
encountered very occasional error messages related to having not 
explicitely set the end of the sequence. I've been unable to put 
together a small test case to reproduce the bug easily.

My question is. If the start of the sequence is set, is it mandatory to 
set the end of the sequence? If so, then maybe the documentation needs 
to be explicit about it and maybe there needs to be a check that 
enforces that the end is set. In fact, it seems like if I provide a 
sequence and its start position, the LocatableSeq code should be able to 
automatically calculate its end, no?

Florent


Steffen Heyne wrote:
> Hello,
>
> so I tried to fix the problem with the location. Currently it works for
> me with the following changes:
>
> LocatableSeq.pm
>
> sub get_nse{
>
> ...
>
> 	my $ret;
> 	if ($self->strand() >= 0) {
> 		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
> 	} else {
> 		$ret = $id . $v. $char1 . $end . $char2 . $st ;
> 	}
> 	return $ret;
> }
>
> Then I recognized during the usage of $aln->remove_seq() that it cannot
> remove a seq as it uses a wrong NSE to lookup sequences. I changed the
> following:
>
> SimpleAlign.pm
>
> sub remove_seq {
>
> ...
> 	$id = $seq->id();
>     	$start = $seq->start();
>     	$end  = $seq->end();
>
> ## changed code:
>
> 	my $v = $seq->version ? '.'.$seq->version : '';
>     	if ($seq->strand >=0){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
> 	} elsif ($seq->strand == -1){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
> 	}	
> ...
>
> }
>
> The above code in LocatableSeq.pm worked in the case if I read an
> alignment in stockholm format and write it out in clustalw format. But
> if I read an alignment in clustalw and write it out as stockholm (or
> something else) it didn't worked, as the strand is not correctly set in
> ClustalW::next_aln. It works with the following changes:
>
> ClustalW.pm
>
> sub next_aln{
>
> ...
>
> 	my ( $sname, $start, $end, $strand );	## strand added
> 	$strand = 0;				## new, standard = 0???
>     	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
> %alignments ) {
>         if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
>         	( $sname, $start, $end ) = ( $1, $2, $3 );
> 		$strand = 1;			## new			
> 		if ($start > $end) {		## new
>        		($start, $end, $strand) = ($end, $start, -1); ##new
> 		}				## new
> 	
>       }
>         else {
>             ( $sname, $start ) = ( $name, 1 );
>             my $str = $alignments{$name};
>             $str =~ s/[^A-Za-z]//g;
>             $end = length($str);
>         }
>
>         my $seq = Bio::LocatableSeq->new(
>             -seq   => $alignments{$name},
>             -id    => $sname,
>             -start => $start,
>             -end   => $end,
> 	    -strand=> $strand			## new
>         );
>
> ...
>
> }
>
> So I don't know if I changed things at their correct position. And I
> found them only because I used certain functions. I dont know how broad
> the effect of a changed NSE in LocatableSeq.pm is to other Modules and
> functions. But I'm happy with my changes (so far :-)...).
>
> Do you will change this to your proposed way in bioperl trunk?
>
> Thanks!
>
> steffen
>
>
> Chris Fields schrieb:
>   
>> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
>>
>>     
>>> Hi,
>>>
>>> I'm using Bioperl for my research and it is very useful! Thank you!
>>>
>>> Currently I have a problem with locations tags of sequences. I read in
>>> seed alignments of Rfam (in stockholm format, but I think it is
>>> similar to other formats).
>>>
>>> If the location is like:
>>>
>>> AB194432.1/908-846
>>>
>>> the start/end values are changed to
>>>
>>> $seq->start = 846
>>> $seq->end = 908
>>>
>>> and therefore the new location (e.g.$seq->get_nse) is:
>>>
>>> AB194432.1/846-908
>>>
>>> The $seq->strand tag is correctly set to -1 in this case, but if the
>>> alignment is written out again (clustal, stockholm,...) this strand
>>> info is lost and the sequences have this "wrong" location. But this
>>> information is important in respect to the sequence accession number.
>>>
>>> Is there a way to set the location back to the original one or is this
>>> behavior desired? Any manually setting with $seq->start($val) failed
>>> due to automatic checking.
>>>
>>> I'm using bioperl 1.6.1
>>>
>>> Thanks!
>>>
>>> steffen
>>>       
>> This is a definite bug. We recently discussed amending the NSE format
>> due to this (the subject came up over the last few months or so); it's
>> fallen through the cracks.  Fortunaely it is very easy to fix (the
>> relevant method is in LocatableSeq).
>>
>> Does anyone have a problem with me adding this in?  It will change
>> output for only those instances where the strand is -1, so
>>
>> AB194432.1/908-846
>>
>> would be start = 846, end = 908, strand = -1
>>
>> AB194432.1/846-908
>>
>> would be start = 846, end = 908, strand = 1
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   


From cjfields at illinois.edu  Fri Dec  4 04:16:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 3 Dec 2009 22:16:48 -0600
Subject: [Bioperl-l] Merging separate sequence and quality files to
	FASTQ ?
In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com>
	<320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com>
	<2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com>
	<747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu>
	<2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com>
Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu>


On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote:

> 2009/12/3 Chris Fields <cjfields at illinois.edu>:
>> Dan,
>> 
>> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote:
> 
> ...
> 
>>> I'm wondering, because with the FlaggedRevs extension you could
>>> basically build a whole release in the wiki. Which would be fun if
>>> nothing else!
>> 
>> I'm not following you there.  Could you elaborate on why that would be beneficial?  I could see (
> 
> I never said it would be beneficial, only that it would be fun.
> 
> http://www.mediawiki.org/wiki/Flaggedrevs

Ah, okay, that makes some sense.  

Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue.

chris


From rtbio.2009 at gmail.com  Fri Dec  4 13:57:21 2009
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 4 Dec 2009 14:57:21 +0100
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
Message-ID: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>

Hello all,

I am working on Remote blast.Here,I am trying to get 2 parameters into the
remote blast code.They are

1.The input sequence that has to be sent to blast

2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
etc.,)

When I tried to take the organism parameter as an input from the
user,through a web page,the Remote blast was not giving any results i.e., it
says that there are no alignments found.

But,when I hard coded the organism in the code,it gives me the results i.e.,
3hits.

I could not understand this problem.Could any body please help me in this
regard?

My code is

sub blastcode
{

$input1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $input1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
               print OUTFILE @params;
              close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter
 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
'-Organism' => $organism );

while (my $input = $str->next_seq())

{
#Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

   my $r = $factory->submit_blast($input);

   # my $r = $factory->submit_blast('amino.fa');

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

      #    open(BLASTDEBUGFILE,'>',$debugfile);
       #   print BLASTDEBUGFILE $result->next_hit();
        #  close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);
$factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);
}

Regards,
Roopa.


From cjfields at illinois.edu  Fri Dec  4 14:59:17 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 4 Dec 2009 08:59:17 -0600
Subject: [Bioperl-l] Regarding Organism based search in Remote blast
In-Reply-To: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
References: <c7cac1600912040557t617b0588t5a9ec8f6f1abd5cf@mail.gmail.com>
Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu>

Roopa,

At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports).  See here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155

Also, are the returned hits specific for the genome?  You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why):

http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi

chris 
 
On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I am working on Remote blast.Here,I am trying to get 2 parameters into the
> remote blast code.They are
> 
> 1.The input sequence that has to be sent to blast
> 
> 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei
> etc.,)
> 
> When I tried to take the organism parameter as an input from the
> user,through a web page,the Remote blast was not giving any results i.e., it
> says that there are no alignments found.
> 
> But,when I hard coded the organism in the code,it gives me the results i.e.,
> 3hits.
> 
> I could not understand this problem.Could any body please help me in this
> regard?
> 
> My code is
> 
> sub blastcode
> {
> 
> $input1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $input1;
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE @params;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]';
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' ,
> '-Organism' => $organism );
> 
> while (my $input = $str->next_seq())
> 
> {
> #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>   my $r = $factory->submit_blast($input);
> 
>   # my $r = $factory->submit_blast('amino.fa');
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
> 
>     foreach my $rid ( @rids ) {
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>      #    open(BLASTDEBUGFILE,'>',$debugfile);
>       #   print BLASTDEBUGFILE $result->next_hit();
>        #  close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> $factory->save_output($filename);
> 
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> }
> 
> Regards,
> Roopa.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Fri Dec  4 18:27:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Fri, 4 Dec 2009 13:27:38 -0500
Subject: [Bioperl-l] Gene critical region analysis -- visual display
Message-ID: <deaa866a0912041027r71c49f58n7d467f050c2f49c6@mail.gmail.com>

Background:
I have been involved in aging research off and on for ~16 years.  My initial
focus was in the eventual decline of the "program" (because DNA has no ECC
and only limited redundancy) therefore my initial work (in the early 1990's
was focused on DNA repair genes (of which there about 150 in the human
genome) [1,2].  Most recently I have focused in on the DNA double strand
break repair processes (NHEJ) as a fundamental cause of aging because it may
fundamentally corrupt the genomes of individual cells.  (And as most
programmers would agree -- break the code and you break the program).
 Michael Lieber at UCLA has estimated that by the time a human is ~70 on the
order of several hundred genes in ones cells have been corrupted (which may
be an
indeterminate effect on the cells functioning).

Problem:
Just looking at the GenBank output for the human Artemis (DCLRE1C) gene
there are on the order of 18 SNPs and 8 possible phosphorylation sites (not
to mention other potential modification sites) -- this combined with the
fact that Methionine and Tryptophan and to a lesser extent Cysteine are more
susceptible to single base mutations (due the alteration of the codon->amino
acid coding even involving single base mutations/repairs) . There are
various programs to analyze such proteins for the critical sites -- SIFT and
the various programs pointed to by their sites.  Now it seems to me that one
could attack this problem by integrating SNPs, mutations, etc. at the
critical sites (where "critical" may or may not be at normal SNPs -- which
presumably are primarily at non-critical sites -- and those proteins where
if you change the coding sequence to non-synomonous amino acids you
potentially break the protein (the real interpretation of which will not be
understood until population studies are done).

So, in the process of looking at the DCLRE1C protein I asked myself, "Why is
there not a BioPerl function which simply enables a visual interpretation of
the critical sites of the protein?"  I.e. some color-coded representation of
the protein (which presumably has some augmented functionality to determine
things like probability or statistical information).  I.e. hand the function
a .fasta file and it will give you an visual (colored) analysis of the
critical nature of specific a.a. -- i.e. something which could be used by
genomic or SNP analysis (such as I presume that being done by 23andme -- as
well as other organizations) to begin to separate out the variations in the
human genome (e.g. SNPs) from the mutations which may effect individuals.

I have the C programming and to a lesser extent Perl experience to
contribute to this -- I lack the BioPerl wisdom to make it generally
available.

If anyone has some suggestions as to what functions/modules might be of use
(in providing a "single-look" view of gene a.a. whose mutations may be more
or less detrimental) I would appreciate hearing from them.

Robert Bradbury

1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press
(2006)
2. "Aging of the Genome",  J. Vijg, Oxford University Press (2007)


From maj at fortinbras.us  Sun Dec  6 22:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 6 Dec 2009 17:54:00 -0500
Subject: [Bioperl-l] bioperl-mode new feature: base class browsing
Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife>

Hi All, 
You can now browse pod of the base/parent classes of bioperl modules
with one keystroke using the latest update of bioperl-mode.
See http://bioperl.org/wiki/Emacs_bioperl-mode
Press "B" or "P" while in pod view to get a completion list 
of the parent classes for the module whose pod you're viewing.
cheers, 
MAJ


From mmokrejs at ribosome.natur.cuni.cz  Mon Dec  7 20:33:48 2009
From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=)
Date: Mon, 07 Dec 2009 21:33:48 +0100
Subject: [Bioperl-l] Generalized reciprocal blast
In-Reply-To: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
References: <deaa866a0908260838m3c5abf63j6dc75b9b24899c48@mail.gmail.com>
Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz>

Hi,
  I just stumbled across this older posting ... maybe you want to exploit
SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has
remote API available.
Martin

Robert Bradbury wrote:
> I would like to know whether or not anyone has attempted to create a
> "generalized" reciprocal blast component for BioPerl?
> 
> One sees papers all the time where they discuss running reciprocal blasts to
> compare a new species to an old "standard" species or a set of species or
> running an all-to-all set of comparisons to match up all of the "known"
> proteins from species and determine which are outliers (and therefore
> "novel").  There are also accumulating merged sets in NCBI HomoloGene (which
> seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes)
> and Ensembl (which seems to be working with a much larger set of 40-50
> genomes some of which may be somewhat incomplete and are certainly poorly
> "explored".
> 
> I have, I believe, seen code "fragments" from various authors, perhaps some
> on the BioPerl list, which perform some major subset of a typical
> "reciprocal blast".
> 
> Now what I am looking for is a relatively generalizable some-to-some
> reciprocal blast utility.  I want to be able to specify the genes (or gene
> family), e.g. some of the ~150 known DNA repair genes.  It would be helpful
> to also specify how "tolerant" the blast "true reciprocal" criteria are.
> There are some genes where there is a very strict 1-to-1 relationship across
> many genomes.  But for genes which involve relatively standard domains, e.g.
> "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for
> example its more like 5-to-5 and it would be really nice to be able to
> specify the strictness or quality level [1] for "matching" genes (and even
> which genes are to be excluded because they are known to be false
> homologues).
> 
> Then to top this off I want to be able to combine known public e.g.
> (HomoloGene / Uniigene / Ensembl) databases with perhaps local private
> databases or database subsets (e.g. emerging or specialized genomes).
> 
> The goal here of course to determine the precise phylogenetic relationships
> between all of the DNA repair genes and how there may be gain / loss /
> evolution of function that can be related to species characteristics (size,
> longevity, etc.).
> 
> Is there a generalized reciprocal blast component in BioPerl?  Or is it a
> "build-it-yourself" situation (that I have to believe has been built
> probably a few dozen times by various researchers / organizations /
> companies)?
> 
> Thanks,
> Robert Bradbury
> 
> 1. This would be handled in BioPerl with a customizable user function which
> could be tailored to handle specific cases -- for example a function which
> when handed a set of 100 potential "matches" could go through those 100
> matches, identify common domains, and then "re-rate" matches based on
> considerations such as the type and number of common domains, domains being
> in the same order, etc.  I.e. criteria which may be difficult to completely
> generalize across entire genomes but are fairly obvious if you are looking
> at a graphical replication of a gene set in HomoloGene.


From robert.bradbury at gmail.com  Mon Dec  7 20:41:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 7 Dec 2009 15:41:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions
Message-ID: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>

This comment could also have a subject line: "Why does Bioperl/get_sequence>
fork at all!  Why are not all operations sequential?  And if this is a
"default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
script if I have little or no capability of what the program uses when it
runs?  I may have days so I can bear the burden of relatively slow results
(and so can use sequential processing rather than parallel).

I've got a perl script that uses remote blast to blast a sequence against a
subset of the NCBI sequences.  It "mostly" works, in that it returns a
seemingly complete .bls result file but when attempting to look at the
sequences (so it can more accurately summarize the information from the
results than a standard blast report allows) it terminates prematurely with
errors.

The error is:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Couldn't fork: Resource temporarily unavailable
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
STACK: Bio::DB::WebDBSeqI::_open_pipe
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
STACK: Bio::Perl::get_sequence
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
STACK: /home/bradbury/Genomes/bin/RB.pl:155
-----------------------------------------------------------

The precise line (in my code) whcih appears to be generating the error is:
    $seq = get_sequence('GenBank', $accsn);

Now this can be a problem if NCBI/Genbank fails due to load conditions --
but this specific failure (which is repeatable is due to most likely hitting
the user process limit restrictions) -- but the small blast results work
fine -- its only if the Blast has returned several hundred hits that it runs
into this problem.

Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
queries (to get a sequence) with complete disregard of the environment
(process limits, NCBI limits, etc.).  But I do not know enough about how
this works to point a finger at some specific function.  As a result
get_sequence process results are accumulated, summarized, etc. without ever
having issued to respect "wait-variant()) calls to collect former children
[This IMO would clearly be a bug.]

It could be adjusted to by allowing the BioPerl library to run in 3 modes.
 (1) completely synchronous -- if you fork you wait until its done -- and
you collect "it" and any fork fails then one either collects the process or
switches to the non-conservative mode.

Robert


From cjfields at illinois.edu  Mon Dec  7 21:08:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 7 Dec 2009 15:08:40 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <A36A88C9-D94C-4559-A629-56EB8F374DAC@illinois.edu>

Robert, 

If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not.  All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly.

See the POD for those specific modules for more information.

chris

On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl
> script if I have little or no capability of what the program uses when it
> runs?  I may have days so I can bear the burden of relatively slow results
> (and so can use sequential processing rather than parallel).
> 
> I've got a perl script that uses remote blast to blast a sequence against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from the
> results than a standard blast report allows) it terminates prematurely with
> errors.
> 
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
> 
> The precise line (in my code) whcih appears to be generating the error is:
>    $seq = get_sequence('GenBank', $accsn);
> 
> Now this can be a problem if NCBI/Genbank fails due to load conditions --
> but this specific failure (which is repeatable is due to most likely hitting
> the user process limit restrictions) -- but the small blast results work
> fine -- its only if the Blast has returned several hundred hits that it runs
> into this problem.
> 
> Now what it sounds like to me is an attempt to do multiple asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc. without ever
> having issued to respect "wait-variant()) calls to collect former children
> [This IMO would clearly be a bug.]
> 
> It could be adjusted to by allowing the BioPerl library to run in 3 modes.
> (1) completely synchronous -- if you fork you wait until its done -- and
> you collect "it" and any fork fails then one either collects the process or
> switches to the non-conservative mode.
> 
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec  7 21:24:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 7 Dec 2009 13:24:54 -0800
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>

Robert -

You seem to be mixing the blast remote and the sequence query  
retrieval problems. These messages are related to the remote retrieval  
of sequences.
  It is hard to tell from your message specifically which modules you  
are using or how you are querying NCBI - there are several ways to do  
this either with the NCBI tools or the Bio::DB::GenBank.
  If you are using Bio::DB::Query::GenBank that allows for async  
access and has built in controls to adhere to the wait variant that  
NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method  
does any sort of thing (at least when it was originally written).

I always advocate if you want highly available and reliable access to  
sequences you should download the nr or whichever DB and use the local  
indexing tools for the retrieval.  Once you start doing hundreds of  
queries I don't see any good reason to be doing the query against NCBI  
directly given unreliabilities of the web and services. Local  
databases are faster and more reliable for most people so I urge you  
take advantage of the tools which provide local database access with  
the same APIs.


I would like to comment that the tone of your posts to the list are  
not particularly helpful.   I wonder if you are actually asking for  
help or just interested in complaining about when things don't work as  
you expect? This is a collaborative and volunteer-only project, with  
the principles of working together to make useful toolkit.  We  
encourage you to build programs and applications from this base that  
suit your needs, but not all things will be directly implemented in  
the toolkit if they aren't generic enough (at least that is my  
feeling, the other Core devs help with these decisions).
   If there is a useful, generic, and reusable part we would like that  
to be part of the API. Otherwise we suggest the new application that  
fits a developer's vision. We encourage you to write (and publish)  
that application separately, but certainly encourage bug (and fixes)  
submissions and also code contributions for new features where they  
can be seen as generally useful.

-jason
On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/ 
> get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable  
> BioPerl
> script if I have little or no capability of what the program uses  
> when it
> runs?  I may have days so I can bear the burden of relatively slow  
> results
> (and so can use sequential processing rather than parallel).
>
> I've got a perl script that uses remote blast to blast a sequence  
> against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from  
> the
> results than a standard blast report allows) it terminates  
> prematurely with
> errors.
>
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
>
> The precise line (in my code) whcih appears to be generating the  
> error is:
>    $seq = get_sequence('GenBank', $accsn);
>
> Now this can be a problem if NCBI/Genbank fails due to load  
> conditions --
> but this specific failure (which is repeatable is due to most likely  
> hitting
> the user process limit restrictions) -- but the small blast results  
> work
> fine -- its only if the Blast has returned several hundred hits that  
> it runs
> into this problem.
>
> Now what it sounds like to me is an attempt to do multiple  
> asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about  
> how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc.  
> without ever
> having issued to respect "wait-variant()) calls to collect former  
> children
> [This IMO would clearly be a bug.]
>
> It could be adjusted to by allowing the BioPerl library to run in 3  
> modes.
> (1) completely synchronous -- if you fork you wait until its done --  
> and
> you collect "it" and any fork fails then one either collects the  
> process or
> switches to the non-conservative mode.
>
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From Jonas_Schaer at gmx.de  Tue Dec  8 15:21:58 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Tue, 8 Dec 2009 16:21:58 +0100
Subject: [Bioperl-l] fasta format
Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>

Hi there,
I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!).

------------- EXCEPTION -------------
MSG: Each line of the fasta entry must be the same length except the last.
    Line above #49 '
..' is 28 != 101 chars.
STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771
STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
STACK main::readfasta blast_eval.pm:174
STACK toplevel blast_eval.pm:83
-------------------------------------

indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/
DB/Fasta.pm line 1054.


Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ?

Thanks in advance for any help! 

Regards, Jonas


From awitney at sgul.ac.uk  Tue Dec  8 17:01:58 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 8 Dec 2009 17:01:58 +0000
Subject: [Bioperl-l] package to associate genes with branches on trees?
Message-ID: <DB3D347F-EB9E-4A59-87D2-3E1A5FACF154@sgul.ac.uk>

Hi,

I have been generating some trees with Phylip (pars) and then  
processing them with Bioperl. These trees are generated by comparing  
multiple strains of a bacterial organism by presence/absence (0/1)  
calls for each gene.

I was wondering of there was any package in Bioperl to try to  
determine if any specific genes were associated with specific branches  
of the trees? Or if anyone knew of another tool that can do this?

thanks for any help

adam


From jason at bioperl.org  Tue Dec  8 17:44:43 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 8 Dec 2009 09:44:43 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>

you can run
sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or  
that is installed when you install the Bioperl scripts)
$ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa
# rename it back
$ mv yournewfile.fa yourfile.fa

or
$ sreformat fasta yourfile.fa > yournewfile.fa
$ mv yournewfile.fa yourfile.fa


-jason
On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:

> Hi there,
> I have a little question concerning bioperl. I have  
> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read  
> in some fasta files. first it worked fine, but now i have some  
> fastafiles in slightly different format (not all lines have the same  
> length!).
>
> ------------- EXCEPTION -------------
> MSG: Each line of the fasta entry must be the same length except the  
> last.
>    Line above #49 '
> ..' is 28 != 101 chars.
> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ 
> Fasta.pm:771
> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681
> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
> STACK main::readfasta blast_eval.pm:174
> STACK toplevel blast_eval.pm:83
> -------------------------------------
>
> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ 
> site/lib/Bio/
> DB/Fasta.pm line 1054.
>
>
> Is there any way to use these fasta files with diffrent length of  
> lines with this fasta.pm module or will i have to change the format  
> of my fasta-files(big databases...) ?
>
> Thanks in advance for any help!
>
> Regards, Jonas
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Wed Dec  9 04:30:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 8 Dec 2009 22:30:26 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference
Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu>

All,

For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego.  This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13.  The exact day and time is somewhat flexible depending on attendees' schedules.

For those interested, sign up here:

http://www.bioperl.org/wiki/GMOD_2010_Meeting

For those interested in attending the GMOD meeting or PAG:

http://gmod.org/wiki/January_2010_GMOD_Meeting

I can envision the following items popping up:

* Refactoring of Alignment and GFF3/FeatureIO
* Addressing BioPerl's monolithic nature
* Moose and Perl 6
* Documentation

Any others?

chris


From akarger at CGR.Harvard.edu  Wed Dec  9 15:01:45 2009
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed, 9 Dec 2009 10:01:45 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>

> Is there any way to use these fasta files with diffrent length of
> lines with this fasta.pm module or will i have to change the format
> of my fasta-files(big databases...) ?
> 

Jonas,

It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back.

To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy).

The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like).

Let me know if you have a problem.

-Amir Karger
Life Sciences Research Computing, FAS IT
Harvard University


From Kevin.M.Brown at asu.edu  Wed Dec  9 15:26:22 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 9 Dec 2009 08:26:22 -0700
Subject: [Bioperl-l] fasta format
In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>

Even easier to accomplish in one step. Read in the fasta file and output
it right to another fasta file with SeqIO

my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
while (my $seq = $in->next){$out->write_seq($seq);}

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> Sent: Wednesday, December 09, 2009 8:02 AM
> To: Jonas Schaer; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> > Is there any way to use these fasta files with diffrent length of
> > lines with this fasta.pm module or will i have to change the format
> > of my fasta-files(big databases...) ?
> > 
> 
> Jonas,
> 
> It's not Bioperl, but for a quick fix you can use the 
> Scriptome. Use the change_fasta_to_tab script 
> (http://sysbio.harvard.edu/csb/resources/computational/scripto
> me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> format__change_fasta_to_tab_) to change your FASTA into a 
> tab-delimited file. Then use the next tool 
> (change_tab_to_fasta) to change your files back.
> 
> To use a tool: change the input and output file names on the 
> website, then cut and paste the Perl script from the green 
> box into a CMD window. The script works one sequence at a 
> time, so it doesn't need a lot of memory. (As long as you 
> have enough disk space to store the tab-delimited copy).
> 
> The recreated FASTAs will be 60 characters per line (although 
> you can hand-edit the line after you paste it to be whatever 
> number of characters you'd like).
> 
> Let me know if you have a problem.
> 
> -Amir Karger
> Life Sciences Research Computing, FAS IT
> Harvard University
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Russell.Smithies at agresearch.co.nz  Wed Dec  9 19:44:41 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 10 Dec 2009 08:44:41 +1300
Subject: [Bioperl-l] fasta format
In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv>
	<1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>

It's even easier as the script is already written for you :-)

bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa


--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
> Sent: Thursday, 10 December 2009 4:26 a.m.
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] fasta format
> 
> Even easier to accomplish in one step. Read in the fasta file and output
> it right to another fasta file with SeqIO
> 
> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
> while (my $seq = $in->next){$out->write_seq($seq);}
> 
> Kevin Brown
> Center for Innovations in Medicine
> Biodesign Institute
> Arizona State University
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
> > Sent: Wednesday, December 09, 2009 8:02 AM
> > To: Jonas Schaer; bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] fasta format
> >
> > > Is there any way to use these fasta files with diffrent length of
> > > lines with this fasta.pm module or will i have to change the format
> > > of my fasta-files(big databases...) ?
> > >
> >
> > Jonas,
> >
> > It's not Bioperl, but for a quick fix you can use the
> > Scriptome. Use the change_fasta_to_tab script
> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
> > format__change_fasta_to_tab_) to change your FASTA into a
> > tab-delimited file. Then use the next tool
> > (change_tab_to_fasta) to change your files back.
> >
> > To use a tool: change the input and output file names on the
> > website, then cut and paste the Perl script from the green
> > box into a CMD window. The script works one sequence at a
> > time, so it doesn't need a lot of memory. (As long as you
> > have enough disk space to store the tab-delimited copy).
> >
> > The recreated FASTAs will be 60 characters per line (although
> > you can hand-edit the line after you paste it to be whatever
> > number of characters you'd like).
> >
> > Let me know if you have a problem.
> >
> > -Amir Karger
> > Life Sciences Research Computing, FAS IT
> > Harvard University
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Wed Dec  9 20:18:08 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 9 Dec 2009 15:18:08 -0500
Subject: [Bioperl-l] fasta format
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu>
	<18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz>
Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife>

$ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, 
">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas

----- Original Message ----- 
From: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
To: "'Kevin Brown'" <Kevin.M.Brown at asu.edu>; <bioperl-l at bioperl.org>
Sent: Wednesday, December 09, 2009 2:44 PM
Subject: Re: [Bioperl-l] fasta format


> It's even easier as the script is already written for you :-)
>
> bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa
>
>
> --Russell
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Kevin Brown
>> Sent: Thursday, 10 December 2009 4:26 a.m.
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] fasta format
>>
>> Even easier to accomplish in one step. Read in the fasta file and output
>> it right to another fasta file with SeqIO
>>
>> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file);
>> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta');
>> while (my $seq = $in->next){$out->write_seq($seq);}
>>
>> Kevin Brown
>> Center for Innovations in Medicine
>> Biodesign Institute
>> Arizona State University
>>
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger
>> > Sent: Wednesday, December 09, 2009 8:02 AM
>> > To: Jonas Schaer; bioperl-l at bioperl.org
>> > Subject: Re: [Bioperl-l] fasta format
>> >
>> > > Is there any way to use these fasta files with diffrent length of
>> > > lines with this fasta.pm module or will i have to change the format
>> > > of my fasta-files(big databases...) ?
>> > >
>> >
>> > Jonas,
>> >
>> > It's not Bioperl, but for a quick fix you can use the
>> > Scriptome. Use the change_fasta_to_tab script
>> > (http://sysbio.harvard.edu/csb/resources/computational/scripto
>> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_
>> > format__change_fasta_to_tab_) to change your FASTA into a
>> > tab-delimited file. Then use the next tool
>> > (change_tab_to_fasta) to change your files back.
>> >
>> > To use a tool: change the input and output file names on the
>> > website, then cut and paste the Perl script from the green
>> > box into a CMD window. The script works one sequence at a
>> > time, so it doesn't need a lot of memory. (As long as you
>> > have enough disk space to store the tab-delimited copy).
>> >
>> > The recreated FASTAs will be 60 characters per line (although
>> > you can hand-edit the line after you paste it to be whatever
>> > number of characters you'd like).
>> >
>> > Let me know if you have a problem.
>> >
>> > -Amir Karger
>> > Life Sciences Research Computing, FAS IT
>> > Harvard University
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kellert at ohsu.edu  Thu Dec 10 00:36:13 2009
From: kellert at ohsu.edu (Tom Keller)
Date: Wed, 9 Dec 2009 16:36:13 -0800
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>

Greetings,
Is there a simple way to map a list of ensembl ids to the NCBI gis?

thanks,
Tom

Thomas (Tom) Keller
kellert at ohsu.edu
503.494.2442
6339b R Jones Hall (BSc/CROET)
www.ohsu.edu/xd/research/research-cores/dna-analysis/


From cjfields at illinois.edu  Thu Dec 10 01:59:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 9 Dec 2009 19:59:37 -0600
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu>
Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu>

Tom,

Probably best to do this via BioMart:

http://www.ensembl.org/biomart/

I would assume you can also do this via the ensembl perl API as well.

Also, have a look at the UniProt ID Mapper:

http://www.uniprot.org/?tab=mapping

chris

On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:

> Greetings,
> Is there a simple way to map a list of ensembl ids to the NCBI gis?
> 
> thanks,
> Tom
> 
> Thomas (Tom) Keller
> kellert at ohsu.edu
> 503.494.2442
> 6339b R Jones Hall (BSc/CROET)
> www.ohsu.edu/xd/research/research-cores/dna-analysis/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lovebaby39 at gmail.com  Thu Dec 10 14:22:14 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 22:22:14 +0800
Subject: [Bioperl-l] about bioperl issue
Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC>

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS');
my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2);
my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------ 
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: R20080801-1.seq.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091210/0431bad7/attachment-0004.txt>

From SMarkel at accelrys.com  Thu Dec 10 14:47:36 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 10 Dec 2009 06:47:36 -0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net>

Reginald,

I didn't see anything highlighted in red but the three strings in the
pairwise alignment display can be obtained from an HSP using

    $hsp->query_string()
    $hsp->hit_string()
    $hsp->homology_string()

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh
Sent: Thursday, 10 December 2009 6:22 AM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] about bioperl issue
Importance: High

Dear 

The following is code. 


--------------------------------------------------------------------------------

my at params_rb = ( 'program'  => 'blastn',
            'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);

my $input_rb = Bio::Seq->new(-id  =>"test_query",
                       -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb);

 while (my $result_rb =  $blast_report_rb-> next_result ) {
  while (my $hit_rb = $result_rb->next_hit()){
   while (my $hsp_rb = $hit_rb->next_hsp()){
    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ;
    #print " ",$hit->name,"\n";
   }
  }
 }

--------------------------------------------------------------------------------


I know how to get "name", "evalue" and  "score", but I don't know how  to get the word which is in red color. (or please see attachment.)
------------------------------------------------------------------------------------------------------------------
Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
                   |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
------------------------------------------------------------------------------------------------------------------ 
  
I will appreciate if you could tell me how to do it.
Thank you.

Reginald Hsueh


From David.Messina at sbc.su.se  Thu Dec 10 15:09:31 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:09:31 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>

Hi Reginald,

None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.

Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.


Dave


From David.Messina at sbc.su.se  Thu Dec 10 15:36:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 10 Dec 2009 16:36:49 +0100
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>

Hi Reginald,

Please keep all replies on the list so that everyone can follow the thread.

In a separate email, Scott gave the answer you were looking for,  I think.

Namely:
   $hsp->query_string()
OR
   $hsp->hit_string()


Dave


On Dec 10, 2009, at 16:31, Hsueh wrote:

> Dear Dave Messina
> 
> I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
> 
> Thank you
> 
> Reginald Hsueh
> 
> ------------------------------------------------------------------------------------------------------------------------------
> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206
>                  |||||| ||||||||||||||||||    |||| || ||||||  |||||||||||| ||
> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
> ------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> --------------------------------------------------
> From: "Dave Messina" <David.Messina at sbc.su.se>
> Sent: Thursday, December 10, 2009 11:09 PM
> To: "Hsueh" <lovebaby39 at gmail.com>
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] about bioperl issue
> 
>> Hi Reginald,
>> 
>> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists.
>> 
>> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it.
>> 
>> 
>> Dave


From lovebaby39 at gmail.com  Thu Dec 10 15:53:00 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Thu, 10 Dec 2009 23:53:00 +0800
Subject: [Bioperl-l] about bioperl issue
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <AEA3314B45B14452A4BD1E3A2235AA5D@SHAPC>

Dear Dave Messina

Thank you for your replies.

Reginald Hsueh

--------------------------------------------------
From: "Dave Messina" <David.Messina at sbc.su.se>
Sent: Thursday, December 10, 2009 11:36 PM
To: "Hsueh" <lovebaby39 at gmail.com>
Cc: <bioperl-l at bioperl.org>
Subject: Re: [Bioperl-l] about bioperl issue

> Hi Reginald,
>
> Please keep all replies on the list so that everyone can follow the 
> thread.
>
> In a separate email, Scott gave the answer you were looking for,  I think.
>
> Namely:
>   $hsp->query_string()
> OR
>   $hsp->hit_string()
>
>
>
> Dave
>
>
>
>
> On Dec 10, 2009, at 16:31, Hsueh wrote:
>
>> Dear Dave Messina
>>
>> I need to get the string that is 
>> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga".
>>
>> Thank you
>>
>> Reginald Hsueh
>>
>> ------------------------------------------------------------------------------------------------------------------------------
>> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>> 206
>>                  |||||| ||||||||||||||||||    |||| || |||||| 
>> |||||||||||| ||
>> Sbjct: 114   ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 
>> 173
>> ------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>> --------------------------------------------------
>> From: "Dave Messina" <David.Messina at sbc.su.se>
>> Sent: Thursday, December 10, 2009 11:09 PM
>> To: "Hsueh" <lovebaby39 at gmail.com>
>> Cc: <bioperl-l at bioperl.org>
>> Subject: Re: [Bioperl-l] about bioperl issue
>>
>>> Hi Reginald,
>>>
>>> None of the words in your email or the attachment are colored red ? 
>>> unfortunately any kind of formatting tends to get removed from emails 
>>> send to mailing lists.
>>>
>>> Could you be more specific about what part of the blast report you are 
>>> not able to get? You could even just copy and paste that particular bit 
>>> of the report into your reply if it's not clear what to call it.
>>>
>>>
>>> Dave


>>>>Dear
>>>>
>>>>The following is code.
>>>>
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>my at params_rb = ( 'program'  => 'blastn',
>>>>            'database' => 'DB\\RB_GUS\\RB_GUS');
>>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb);
>>>>
>>>>my $input_rb = Bio::Seq->new(-id  =>"test_query",
>>>>                       -seq => $testline2);
>>>>my $blast_report_rb = $factory_rb->blastall($input_rb);
>>>>
>>>> while (my $result_rb =  $blast_report_rb-> next_result ) {
>>>>  while (my $hit_rb = $result_rb->next_hit()){
>>>>   while (my $hsp_rb = $hit_rb->next_hsp()){
>>>>    print $hit_rb->name,"\nevalue = " ,  $hsp_rb->evalue , "\t score = " 
>>>> , $hsp_rb->score , "\n" ;
>>>>    #print " ",$hit->name,"\n";
>>>>   }
>>>>  }
>>>> }
>>>>
>>>>--------------------------------------------------------------------------------
>>>>
>>>>
>>>>I know how to get "name", "evalue" and  "score", but I don't know how 
>>>>to get the word which is in red color. (or please see attachment.)
>>>>------------------------------------------------------------------------------------------------------------------
>>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 
>>>>206
>>>>                   |||||| ||||||||||||||||||    |||| || |||||| 
>>>> |||||||||||| ||
>>>>Sbjct: 114 
>>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173
>>>>------------------------------------------------------------------------------------------------------------------
>>>>
>>>>I will appreciate if you could tell me how to do it.
>>>>Thank you.
>>>>
>>>>Reginald Hsueh 


From pg4 at sanger.ac.uk  Thu Dec 10 20:50:40 2009
From: pg4 at sanger.ac.uk (Pablo Marin-Garcia)
Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT)
Subject: [Bioperl-l] how to map ensembl id to NCBI gi
In-Reply-To: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
References: <mailman.13.1260464408.29500.bioperl-l@lists.open-bio.org>
Message-ID: <alpine.DEB.1.10.0912102042180.8440@deskpro17122.dynamic.sanger.ac.uk>


If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) 
please read this recent thread at ensembl-dev:

http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html

Seems that the ensembl gene mapping to NCBI is done through translation so 
the noncoding genes do not have the corresponding NCBI gene mapped.


   -Pablo


> ------------------------------
>
> Message: 4
> Date: Wed, 9 Dec 2009 19:59:37 -0600
> From: Chris Fields <cjfields at illinois.edu>
> Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi
> To: Tom Keller <kellert at ohsu.edu>
> Cc: BioPerl-List <bioperl-l at bioperl.org>
> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu>
> Content-Type: text/plain; charset=us-ascii
>
> Tom,
>
> Probably best to do this via BioMart:
>
> http://www.ensembl.org/biomart/
>
> I would assume you can also do this via the ensembl perl API as well.
>
> Also, have a look at the UniProt ID Mapper:
>
> http://www.uniprot.org/?tab=mapping
>
> chris
>
> On Dec 9, 2009, at 6:36 PM, Tom Keller wrote:
>
>> Greetings,
>> Is there a simple way to map a list of ensembl ids to the NCBI gis?
>>
>> thanks,
>> Tom
>>
>> Thomas (Tom) Keller
>> kellert at ohsu.edu
>> 503.494.2442
>> 6339b R Jones Hall (BSc/CROET)
>> www.ohsu.edu/xd/research/research-cores/dna-analysis/
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>

====================================================================
                      Pablo Marin-Garcia, PhD

                     \\//          (Argiope bruennichi
                \/\/`(||>O:'\/\/   with stabilimentum)
                     //\\

Sanger Institute                |  PostDoc / Computer Biologist
Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH     |  room : N333
United Kingdom                  |  email: pablo.marin at sanger.ac.uk
====================================================================


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From umjsm at leeds.ac.uk  Fri Dec 11 16:44:42 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Fri, 11 Dec 2009 16:44:42 +0000
Subject: [Bioperl-l] extract and write a pdb chain
Message-ID: <1260549882.6484.11.camel@limm-pc1254>

Hello,

I am trying to do a very easy think but I don't get it. I want to write
in a file a chain of a pdb. I have try a lot of thinks but what I think
that it should work is the next script:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;

my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $chain ($struc->get_chains) {
	if($chain->id eq "A"){
		$new_entry->chain($chain);
		last;
	}
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');#
$out->write_structure($new_entry);

it doesn't. I get the next error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: add_chain: first argument needs to be a Model object ()

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335
STACK:
Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304
STACK: read_pdb.pl:10
-----------------------------------------------------------

As far I understand the documentation, the method chain of the object
Bio::Structure::Entry requires an as input an object of type Chain.

Any solution will be very welcome.

best regards,
Joan


From wkretzsch at gmail.com  Fri Dec 11 19:22:31 2009
From: wkretzsch at gmail.com (Warren W. Kretzschmar)
Date: Fri, 11 Dec 2009 14:22:31 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files
	generated by Hudson's ms
Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>

Hi,
I'm new to the bioperl community.  I've created a perl module that
reads in msOUT files generated by Hudson's ms.  As far as I
understand, there is no SeqIO module to read and output these files?
If so, I propose to create a module that does this.  Any suggestions?

Thanks,
Warren Kretzschmar


From maj at fortinbras.us  Fri Dec 11 19:59:53 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 11 Dec 2009 14:59:53 -0500
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT
	filesgenerated by Hudson's ms
In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com>
Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife>

Hi Warren,
I say go for it. You'll want to have a look at
http://bio.perl.org/wiki/Advanced_BioPerl
which explains most of our tips and "policies" for prospective
code contributors, as well as
http://bio.perl.org/wiki/HOWTO:SeqIO
which details SeqIO from the user's perspective. Look
carefully at some Bio::SeqIO::* modules for implementation
details. If you have code to propose, use
http://bugzilla.bioperl.org
and enter a new enhancement, where you can upload
your module for us to review.
MAJ
----- Original Message ----- 
From: "Warren W. Kretzschmar" <wkretzsch at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 11, 2009 2:22 PM
Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by 
Hudson's ms


> Hi,
> I'm new to the bioperl community.  I've created a perl module that
> reads in msOUT files generated by Hudson's ms.  As far as I
> understand, there is no SeqIO module to read and output these files?
> If so, I propose to create a module that does this.  Any suggestions?
>
> Thanks,
> Warren Kretzschmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bosborne11 at verizon.net  Fri Dec 11 20:37:45 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 11 Dec 2009 15:37:45 -0500
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260549882.6484.11.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
Message-ID: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>

Joan,

It looks to me like the first argument to the add_chain() method has  
to be a Model object, the second is the Chain itself. See Structure/ 
Entry.pm, for example. However if you're seeing some documentation  
that says something else then tell us where, it needs to be corrected.

In Bio::Structure an Entry consists of one or Models, each of which  
has one or more Chains. This allows you to build macromolecular  
complexes (an Entry), which could have more than one defined proteins  
or protein complexes (Models).

Brian O.

On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:

> Hello,
>
> I am trying to do a very easy think but I don't get it. I want to  
> write
> in a file a chain of a pdb. I have try a lot of thinks but what I  
> think
> that it should work is the next script:
>
> use Bio::Structure::IO;
> use strict;
>
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> =>
> 'pdb');
> my $struc = $structio->next_structure;
>
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
>
> for my $chain ($struc->get_chains) {
> 	if($chain->id eq "A"){
> 		$new_entry->chain($chain);
> 		last;
> 	}
> }
>
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');#
> $out->write_structure($new_entry);
>
> it doesn't. I get the next error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: add_chain: first argument needs to be a Model object ()
>
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> 368
> STACK:
> Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:335
> STACK:
> Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:391
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> Structure/Entry.pm:304
> STACK: read_pdb.pl:10
> -----------------------------------------------------------
>
> As far I understand the documentation, the method chain of the object
> Bio::Structure::Entry requires an as input an object of type Chain.
>
> Any solution will be very welcome.
>
> best regards,
> Joan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Sun Dec 13 21:48:13 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Sun, 13 Dec 2009 21:48:13 +0000
Subject: [Bioperl-l] combining tree image with heatmap
Message-ID: <4B25611D.6050009@sgul.ac.uk>

I am trying to draw a tree on the side of a heatmap image, much like you
see after clustering data.

I was wondering if anyone has managed to do this using bioperl? I can
draw the two separately, but can't quite seem to work out how to put the
two together and get the nodes to line up with the correct row of
clustering data.

Is there any particular module to look at?

thanks for any help

adam


From dhwani1030 at gmail.com  Sat Dec 12 20:04:01 2009
From: dhwani1030 at gmail.com (dhwani gandhi)
Date: Sat, 12 Dec 2009 15:04:01 -0500
Subject: [Bioperl-l] Bioperl code help
Message-ID: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>

Hi,
I am very new to Bioperl but I am somewhat familiar to perl though.

I write my perl programs in Notepad++ and run them in cmd.

Now, I want to run Bioperl programs. I just installed bioperl on my
computer. And I have a program using bioperl modules in Notepad++.

My question is how to run these programs? Can they be ran in cmd as well? or
do I use ppm?

Please help.

Thanks,
-Dhwani Gandhi.


From eric_donaldson at med.unc.edu  Sun Dec 13 23:15:24 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Sun, 13 Dec 2009 18:15:24 -0500
Subject: [Bioperl-l] problem with install
Message-ID: <f77787b07d66b.4b252f3c@med.unc.edu>

Hello,

Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the 

fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10.

But now I get an error when trying to run a bioperl script.

Here is the error:

Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8.
BEGIN failed--compilation aborted at blastparser.pl line 8.


I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me?

Thank you,

Eric


Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From jason at bioperl.org  Mon Dec 14 01:24:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 17:24:26 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f77787b07d66b.4b252f3c@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>

Hi Eric -

Bio::Tools::BPlite is no longer supported in Bioperl - it was  
deprecated several releases ago.
It was replaced with Bio::SearchIO

-jason
On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:

> Hello,
>
> Today I downloaded bioperl 1.61 on my new macbook pro using fink.  I  
> used the
>
> fink install bioperl.pm-588 as I could not get it to instal using  
> the perl version 5.10.
>
> But now I get an error when trying to run a bioperl script.
>
> Here is the error:
>
> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ 
> perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.pl line 8.
> BEGIN failed--compilation aborted at blastparser.pl line 8.
>
>
> I am a novice at unix and bioperl so I do not know how to  
> troubleshoot this, would you please hleo me?
>
> Thank you,
>
> Eric
>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Mon Dec 14 04:09:45 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 20:09:45 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f79059397d7fa.4b255f0b@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>

So you installed perl-5.10 or using system perl?  I'm confused if you  
actually installed bioperl.pm or not via fink?

It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5  
which is one of the dirs it would have installed in, but I don't think  
you actually installed bioperl.

you can try and do:
$ locate Bio/SearchIO.pm

We'll see if any of the other osx/fink gurus are on the list that can  
help or you can install it via CPAN I guess.

-jason
On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:

>
> I actually tried a different blastparser that uses BIO::SearchIO and  
> got the same message:
>
> Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ 
> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / 
> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ 
> darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ 
> Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- 
> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at  
> blastparser.new.pl line 8.
> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>
> I suspect there is a path problem, but am not savvy enough to know  
> how to fix it.  I am really just a hacker.... I have several scripts  
> that I use regularly and that I know how to modify, but am lost when  
> they don't work...
>
> Thanks for any help,
>
> Eric
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 8:24 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: bioperl-l at bioperl.org
>
>> Hi Eric -
>>
>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>> was
>> deprecated several releases ago.
>> It was replaced with Bio::SearchIO
>>
>> -jason
>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>
>>> Hello,
>>>
>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>> fink.  I
>>> used the
>>>
>>> fink install bioperl.pm-588 as I could not get it to instal
>> using
>>> the perl version 5.10.
>>>
>>> But now I get an error when trying to run a bioperl script.
>>>
>>> Here is the error:
>>>
>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>> /sw/lib/
>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>> /sw/lib/perl5/darwin /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>
>>>
>>> I am a novice at unix and bioperl so I do not know how
>> to
>>> troubleshoot this, would you please hleo me?
>>>
>>> Thank you,
>>>
>>> Eric
>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>>
>> < 
>> eric_donaldson.vcf>_______________________________________________>  
>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From jason at bioperl.org  Mon Dec 14 05:10:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 13 Dec 2009 21:10:54 -0800
Subject: [Bioperl-l] problem with install
In-Reply-To: <f7a30bbc786b3.4b258092@med.unc.edu>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>

Eric -
please CC the bioperl list when responding so others can help - I  
can't be the only answerer.

But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you  
would need to make sure that is added to your PERL5LIB.
There are some help docs on the perl sites I expect on how to get your  
PATHs in order.

Or you can just install via CPAN which will put it in the right path -  
there are docs on the bioperl website about installing via CPAN.

-jason
On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:

> Hi Jason,
>
> The fink package did not have support for perl 5.10, so I attempted  
> to install the perl 5.8.6 package.
>
> When I attempted: locate Bio/SearchIO.pm
> I got: -bash: $: command not found
>
> So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ 
> SearchIO.pm  I cannot access it.  Do I need to use the older version  
> of perl?
>
> Would it be better to install with CPAN?  If so, can you send me to  
> a page that has instructions?
>
> Thank you so much!
>
> ERic
>
>
> ----- Original Message -----
> From: Jason Stajich <jason at bioperl.org>
> Date: Sunday, December 13, 2009 11:10 pm
> Subject: Re: [Bioperl-l] problem with install
> To: eric_donaldson at med.unc.edu
> Cc: BioPerl List <bioperl-l at bioperl.org>
>
>> So you installed perl-5.10 or using system perl?  I'm
>> confused if you
>> actually installed bioperl.pm or not via fink?
>>
>> It seems like since your @INC or $PERL5LIB points to
>> /sw/lib/perl5
>> which is one of the dirs it would have installed in, but I don't
>> think
>> you actually installed bioperl.
>>
>> you can try and do:
>> $ locate Bio/SearchIO.pm
>>
>> We'll see if any of the other osx/fink gurus are on the list
>> that can
>> help or you can install it via CPAN I guess.
>>
>> -jason
>> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
>>
>>>
>>> I actually tried a different blastparser that uses
>> BIO::SearchIO and
>>> got the same message:
>>>
>>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
>> /sw/lib/perl5/
>>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
>> /
>>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/5.10.0
>> /Library/Perl/5.10.0/
>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>> /Network/Library/
>>> Perl/5.10.0/darwin-thread-multi-2level
>> /Network/Library/Perl/5.10.0 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-
>>
>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>> at
>>> blastparser.new.pl line 8.
>>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
>>>
>>> I suspect there is a path problem, but am not savvy enough to
>> know
>>> how to fix it.  I am really just a hacker.... I have
>> several scripts
>>> that I use regularly and that I know how to modify, but am
>> lost when
>>> they don't work...
>>>
>>> Thanks for any help,
>>>
>>> Eric
>>>
>>> ----- Original Message -----
>>> From: Jason Stajich <jason at bioperl.org>
>>> Date: Sunday, December 13, 2009 8:24 pm
>>> Subject: Re: [Bioperl-l] problem with install
>>> To: eric_donaldson at med.unc.edu
>>> Cc: bioperl-l at bioperl.org
>>>
>>>> Hi Eric -
>>>>
>>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
>>>> was
>>>> deprecated several releases ago.
>>>> It was replaced with Bio::SearchIO
>>>>
>>>> -jason
>>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
>>>> fink.  I
>>>>> used the
>>>>>
>>>>> fink install bioperl.pm-588 as I could not get it to instal
>>>> using
>>>>> the perl version 5.10.
>>>>>
>>>>> But now I get an error when trying to run a bioperl script.
>>>>>
>>>>> Here is the error:
>>>>>
>>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
>>>> /sw/lib/
>>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
>>>> /sw/lib/perl5/darwin /
>>>>> Library/Perl/Updates/5.10.0
>> /System/Library/Perl/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/5.10.0
>>>> /Library/Perl/5.10.0/
>>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
>>>> /Network/Library/
>>>>> Perl/5.10.0/darwin-thread-multi-2level
>>>> /Network/Library/Perl/5.10.0 /
>>>>> Network/Library/Perl
>> /System/Library/Perl/Extras/5.10.0/darwin-
>>>>
>>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
>>>> at
>>>>> blastparser.pl line 8.
>>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
>>>>>
>>>>>
>>>>> I am a novice at unix and bioperl so I do not know how
>>>> to
>>>>> troubleshoot this, would you please hleo me?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>> Eric F. Donaldson, Ph.D.
>>>>> Research Assistant Professor, Ralph Baric Lab
>>>>> University of North Carolina
>>>>> Department of Epidemiology
>>>>>
>>>>>
>>>>>
>>>> <
>>>>
>> eric_donaldson.vcf>_______________________________________________>
>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>>
>>>>
>>>
>>> Eric F. Donaldson, Ph.D.
>>> Research Assistant Professor, Ralph Baric Lab
>>> University of North Carolina
>>> Department of Epidemiology
>>>
>>>
>>> <eric_donaldson.vcf>
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>>
>>
>
> Eric F. Donaldson, Ph.D.
> Research Assistant Professor, Ralph Baric Lab
> University of North Carolina
> Department of Epidemiology
>
>
> <eric_donaldson.vcf>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From awitney at sgul.ac.uk  Mon Dec 14 09:36:19 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 14 Dec 2009 09:36:19 +0000
Subject: [Bioperl-l] Bioperl code help
In-Reply-To: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
References: <e0c98c760912121204o5640160co6468dfbdd84e4789@mail.gmail.com>
Message-ID: <4B260713.3070402@sgul.ac.uk>


bioperl programs are just perl programs so you should run them in
exactly the same way as your perl prorgrams, from the command line

HTH

adam

On 12/12/2009 20:04, dhwani gandhi wrote:
> Hi,
> I am very new to Bioperl but I am somewhat familiar to perl though.
> 
> I write my perl programs in Notepad++ and run them in cmd.
> 
> Now, I want to run Bioperl programs. I just installed bioperl on my
> computer. And I have a program using bioperl modules in Notepad++.
> 
> My question is how to run these programs? Can they be ran in cmd as well? or
> do I use ppm?
> 
> Please help.
> 
> Thanks,
> -Dhwani Gandhi.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From umjsm at leeds.ac.uk  Mon Dec 14 10:39:32 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 10:39:32 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
Message-ID: <1260787172.7359.0.camel@limm-pc1254>

Hi Brian,

I am not calling the method add_chain, I am calling the method chain

http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6

and if I don't use as an argument an object of type

Bio::Structure::Chain

I get an error like this (-->depends of the argument<--)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
we want a Bio::Structure::Chain or a list of these

STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
STACK:
Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
STACK: read_pdb.pl:11
-----------------------------------------------------------


And if I use a Chain object I get the error that I told you.

I have try this code:

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
my $model = Bio::Structure::Model->new( -id  => '0');

for my $chain ($struc->get_chains) {
        if($chain->id eq "A"){
                $new_entry->add_chain($model,$chain);

                last;
        }
}
$new_entry->add_model($model);
my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_entry);


But I get an empty pdb

HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
stru              
REMARK
1                                                                      
TER       1          A
0                                                      
MASTER                                                                          
END  

I am trying a lot of combinations, but I can't write a single chain into
a file. I don't know what I am doing wrong.

Thanks for helping

regards,
Joan


On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> Joan,
> 
> It looks to me like the first argument to the add_chain() method has  
> to be a Model object, the second is the Chain itself. See Structure/ 
> Entry.pm, for example. However if you're seeing some documentation  
> that says something else then tell us where, it needs to be corrected.
> 
> In Bio::Structure an Entry consists of one or Models, each of which  
> has one or more Chains. This allows you to build macromolecular  
> complexes (an Entry), which could have more than one defined proteins  
> or protein complexes (Models).
> 
> Brian O.
> 
> On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> 
> > Hello,
> >
> > I am trying to do a very easy think but I don't get it. I want to  
> > write
> > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > think
> > that it should work is the next script:
> >
> > use Bio::Structure::IO;
> > use strict;
> >
> > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > =>
> > 'pdb');
> > my $struc = $structio->next_structure;
> >
> > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> >
> > for my $chain ($struc->get_chains) {
> > 	if($chain->id eq "A"){
> > 		$new_entry->chain($chain);
> > 		last;
> > 	}
> > }
> >
> > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > 'pdb');#
> > $out->write_structure($new_entry);
> >
> > it doesn't. I get the next error:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: add_chain: first argument needs to be a Model object ()
> >
> > STACK: Error::throw
> > STACK:
> > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > 368
> > STACK:
> > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:335
> > STACK:
> > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:391
> > STACK:
> > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > Structure/Entry.pm:304
> > STACK: read_pdb.pl:10
> > -----------------------------------------------------------
> >
> > As far I understand the documentation, the method chain of the object
> > Bio::Structure::Entry requires an as input an object of type Chain.
> >
> > Any solution will be very welcome.
> >
> > best regards,
> > Joan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From fs5 at sanger.ac.uk  Mon Dec 14 12:18:17 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Mon, 14 Dec 2009 12:18:17 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi,

Maybe I'm really missing something here but I can't find how to parse a
file that is basically just the Feature Table from an EMBL file, looking
like this:

FT   CDS
join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842)
FT                   /colour=7
FT                   /product="RNA-binding protein, putative"
FT   CDS             213199..214812
FT                   /colour=7
FT                   /product="eukaryotic translation initiation factor
3
FT                   subunit 7, putative"
...[more of the same]

So the file has no header and no actual sequence and it is used simply
to annotate a chromosome in a genome assembly. I've always used GFF for
that purpose but have been given this file now.
BioSeqIO->new(-format=>"EMBL") complains about the missing header and if
I stick in a fake ID line, it warns about the missing sequence and the
fact that the features don't fit on the sequence (of length 0). 
Of course it's not difficult to write my own parser but I'm sure there
must be a BioPerl way of doing that that I have just overlooked. Thanks
for your help.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From David.Messina at sbc.su.se  Mon Dec 14 14:06:54 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Dec 2009 15:06:54 +0100
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>

Hi Frank,

You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12

Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.

It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.


Dave


PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation


From eric_donaldson at med.unc.edu  Mon Dec 14 14:22:40 2009
From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu)
Date: Mon, 14 Dec 2009 09:22:40 -0500
Subject: [Bioperl-l] problem with install
In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
References: <f77787b07d66b.4b252f3c@med.unc.edu>
	<119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org>
	<f79059397d7fa.4b255f0b@med.unc.edu>
	<404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org>
	<f7a30bbc786b3.4b258092@med.unc.edu>
	<7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org>
Message-ID: <f750f0a17830d.4b2603e0@med.unc.edu>

Thank you Jason.? I appreciate the help.

Eric

----- Original Message -----
From: Jason Stajich <jason at bioperl.org>
Date: Monday, December 14, 2009 12:10 am
Subject: Re: [Bioperl-l] problem with install
To: eric_donaldson at med.unc.edu
Cc: BioPerl List <bioperl-l at bioperl.org>

> Eric -
> please CC the bioperl list when responding so others can help - 
> I? 
> can't be the only answerer.
> 
> But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ 
> you? 
> would need to make sure that is added to your PERL5LIB.
> There are some help docs on the perl sites I expect on how to 
> get your? 
> PATHs in order.
> 
> Or you can just install via CPAN which will put it in the right 
> path -? 
> there are docs on the bioperl website about installing via CPAN.
> 
> -jason
> On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote:
> 
> > Hi Jason,
> >
> > The fink package did not have support for perl 5.10, so I 
> attempted? 
> > to install the perl 5.8.6 package.
> >
> > When I attempted: locate Bio/SearchIO.pm
> > I got: -bash: $: command not found
> >
> > So even though I can find SearchIO.pm in 
> sw/lib/perl5/5.8.8/Bio/ 
> > SearchIO.pm? I cannot access it.? Do I need to use 
> the older version? 
> > of perl?
> >
> > Would it be better to install with CPAN?? If so, can you 
> send me to? 
> > a page that has instructions?
> >
> > Thank you so much!
> >
> > ERic
> >
> >
> > ----- Original Message -----
> > From: Jason Stajich <jason at bioperl.org>
> > Date: Sunday, December 13, 2009 11:10 pm
> > Subject: Re: [Bioperl-l] problem with install
> > To: eric_donaldson at med.unc.edu
> > Cc: BioPerl List <bioperl-l at bioperl.org>
> >
> >> So you installed perl-5.10 or using system perl?? I'm
> >> confused if you
> >> actually installed bioperl.pm or not via fink?
> >>
> >> It seems like since your @INC or $PERL5LIB points to
> >> /sw/lib/perl5
> >> which is one of the dirs it would have installed in, but I don't
> >> think
> >> you actually installed bioperl.
> >>
> >> you can try and do:
> >> $ locate Bio/SearchIO.pm
> >>
> >> We'll see if any of the other osx/fink gurus are on the list
> >> that can
> >> help or you can install it via CPAN I guess.
> >>
> >> -jason
> >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote:
> >>
> >>>
> >>> I actually tried a different blastparser that uses
> >> BIO::SearchIO and
> >>> got the same message:
> >>>
> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains:
> >> /sw/lib/perl5/
> >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin
> >> /
> >>> Library/Perl/Updates/5.10.0 
> /System/Library/Perl/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/5.10.0
> >> /Library/Perl/5.10.0/
> >>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >> /Network/Library/
> >>> Perl/5.10.0/darwin-thread-multi-2level
> >> /Network/Library/Perl/5.10.0 /
> >>> Network/Library/Perl 
> /System/Library/Perl/Extras/5.10.0/darwin-
> >>
> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >> at
> >>> blastparser.new.pl line 8.
> >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8.
> >>>
> >>> I suspect there is a path problem, but am not savvy enough to
> >> know
> >>> how to fix it.? I am really just a hacker.... I have
> >> several scripts
> >>> that I use regularly and that I know how to modify, but am
> >> lost when
> >>> they don't work...
> >>>
> >>> Thanks for any help,
> >>>
> >>> Eric
> >>>
> >>> ----- Original Message -----
> >>> From: Jason Stajich <jason at bioperl.org>
> >>> Date: Sunday, December 13, 2009 8:24 pm
> >>> Subject: Re: [Bioperl-l] problem with install
> >>> To: eric_donaldson at med.unc.edu
> >>> Cc: bioperl-l at bioperl.org
> >>>
> >>>> Hi Eric -
> >>>>
> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it
> >>>> was
> >>>> deprecated several releases ago.
> >>>> It was replaced with Bio::SearchIO
> >>>>
> >>>> -jason
> >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using
> >>>> fink.? I
> >>>>> used the
> >>>>>
> >>>>> fink install bioperl.pm-588 as I could not get it to instal
> >>>> using
> >>>>> the perl version 5.10.
> >>>>>
> >>>>> But now I get an error when trying to run a bioperl script.
> >>>>>
> >>>>> Here is the error:
> >>>>>
> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains:
> >>>> /sw/lib/
> >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5
> >>>> /sw/lib/perl5/darwin /
> >>>>> Library/Perl/Updates/5.10.0
> >> /System/Library/Perl/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/5.10.0
> >>>> /Library/Perl/5.10.0/
> >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0
> >>>> /Network/Library/
> >>>>> Perl/5.10.0/darwin-thread-multi-2level
> >>>> /Network/Library/Perl/5.10.0 /
> >>>>> Network/Library/Perl
> >> /System/Library/Perl/Extras/5.10.0/darwin-
> >>>>
> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)
> >>>> at
> >>>>> blastparser.pl line 8.
> >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8.
> >>>>>
> >>>>>
> >>>>> I am a novice at unix and bioperl so I do not know how
> >>>> to
> >>>>> troubleshoot this, would you please hleo me?
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>>
> >>>>> Eric F. Donaldson, Ph.D.
> >>>>> Research Assistant Professor, Ralph Baric Lab
> >>>>> University of North Carolina
> >>>>> Department of Epidemiology
> >>>>>
> >>>>>
> >>>>>
> >>>> <
> >>>>
> >> eric_donaldson.vcf>_______________________________________________>
> >>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> jason.stajich at gmail.com
> >>>> jason at bioperl.org
> >>>>
> >>>>
> >>>
> >>> Eric F. Donaldson, Ph.D.
> >>> Research Assistant Professor, Ralph Baric Lab
> >>> University of North Carolina
> >>> Department of Epidemiology
> >>>
> >>>
> >>> <eric_donaldson.vcf>
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >>
> >>
> >
> > Eric F. Donaldson, Ph.D.
> > Research Assistant Professor, Ralph Baric Lab
> > University of North Carolina
> > Department of Epidemiology
> >
> >
> > <eric_donaldson.vcf>
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> 
> 

Eric F. Donaldson, Ph.D.
Research Assistant Professor, Ralph Baric Lab
University of North Carolina
Department of Epidemiology


-------------- next part --------------
begin:vcard
n:Donaldson;Eric
fn:Eric F. Donaldson, PhD
tel;work:919.966.3881
org:University of North Carolina, School of Medicine;Epidemiology
adr:;;2107 McGavran-Greenberg Hall
CB# 7435
;Chapel Hill;NC;27599;USA
email;internet:eric_donaldson at med.unc.edu
email;home;internet:viralnerd at gmail.com
title:Research Assistant Professor
version:2.1
end:vcard

From umjsm at leeds.ac.uk  Mon Dec 14 16:58:03 2009
From: umjsm at leeds.ac.uk (Joan Segura Mora)
Date: Mon, 14 Dec 2009 16:58:03 +0000
Subject: [Bioperl-l] extract and write a pdb chain
In-Reply-To: <1260787172.7359.0.camel@limm-pc1254>
References: <1260549882.6484.11.camel@limm-pc1254>
	<CB45FDCA-32C7-4F49-8F5B-A6342D6E3A41@verizon.net>
	<1260787172.7359.0.camel@limm-pc1254>
Message-ID: <1260809883.7359.15.camel@limm-pc1254>

Hi again,


To extract a pdb chain in a file, I have had to do it adding atom by
atom to a new structure.

use Bio::Structure::IO;
use strict;

my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
'pdb');
my $struc = $structio->next_structure;
my $new_struct = Bio::Structure::Entry->new( -id  => 'structure_id');

for my $model ($struc->get_models){
	$new_struct->add_model($model);
	for my $chain ($struc->get_chains) {
		$new_struct->add_chain($model,$chain);
		if($chain->id eq "A"){
			foreach my $res ($struc->get_residues($chain)){
				$new_struct->add_residue($chain,$res);
				foreach my $atom  ($struc->get_atoms($res)){
					$new_struct->add_atom($res,$atom);
				}
			}
		}
		last;
	}
	last;
}

my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
'pdb');
$out->write_structure($new_struct);

I suppose that there should be a more elegant way to do it.

If someone knows it and can explain it I will be very grateful.

kind regards, 
Joan

On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote:
> Hi Brian,
> 
> I am not calling the method add_chain, I am calling the method chain
> 
> http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6
> 
> and if I don't use as an argument an object of type
> 
> Bio::Structure::Chain
> 
> I get an error like this (-->depends of the argument<--)
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain,
> we want a Bio::Structure::Chain or a list of these
> 
> STACK: Error::throw
> STACK:
> Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368
> STACK:
> Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314
> STACK: read_pdb.pl:11
> -----------------------------------------------------------
> 
> 
> And if I use a Chain object I get the error that I told you.
> 
> I have try this code:
> 
> use Bio::Structure::IO;
> use strict;
> 
> my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' =>
> 'pdb');
> my $struc = $structio->next_structure;
> my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> my $model = Bio::Structure::Model->new( -id  => '0');
> 
> for my $chain ($struc->get_chains) {
>         if($chain->id eq "A"){
>                 $new_entry->add_chain($model,$chain);
> 
>                 last;
>         }
> }
> $new_entry->add_model($model);
> my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> 'pdb');
> $out->write_structure($new_entry);
> 
> 
> But I get an empty pdb
> 
> HEADER    DEFAULT CLASSIFICATION                  24-JAN-70
> stru              
> REMARK
> 1                                                                      
> TER       1          A
> 0                                                      
> MASTER                                                                          
> END  
> 
> I am trying a lot of combinations, but I can't write a single chain into
> a file. I don't know what I am doing wrong.
> 
> Thanks for helping
> 
> regards,
> Joan
> 
> 
> On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote:
> > Joan,
> > 
> > It looks to me like the first argument to the add_chain() method has  
> > to be a Model object, the second is the Chain itself. See Structure/ 
> > Entry.pm, for example. However if you're seeing some documentation  
> > that says something else then tell us where, it needs to be corrected.
> > 
> > In Bio::Structure an Entry consists of one or Models, each of which  
> > has one or more Chains. This allows you to build macromolecular  
> > complexes (an Entry), which could have more than one defined proteins  
> > or protein complexes (Models).
> > 
> > Brian O.
> > 
> > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote:
> > 
> > > Hello,
> > >
> > > I am trying to do a very easy think but I don't get it. I want to  
> > > write
> > > in a file a chain of a pdb. I have try a lot of thinks but what I  
> > > think
> > > that it should work is the next script:
> > >
> > > use Bio::Structure::IO;
> > > use strict;
> > >
> > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format'  
> > > =>
> > > 'pdb');
> > > my $struc = $structio->next_structure;
> > >
> > > my $new_entry = Bio::Structure::Entry->new( -id  => 'structure_id');
> > >
> > > for my $chain ($struc->get_chains) {
> > > 	if($chain->id eq "A"){
> > > 		$new_entry->chain($chain);
> > > 		last;
> > > 	}
> > > }
> > >
> > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' =>
> > > 'pdb');#
> > > $out->write_structure($new_entry);
> > >
> > > it doesn't. I get the next error:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: add_chain: first argument needs to be a Model object ()
> > >
> > > STACK: Error::throw
> > > STACK:
> > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: 
> > > 368
> > > STACK:
> > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:335
> > > STACK:
> > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:391
> > > STACK:
> > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ 
> > > Structure/Entry.pm:304
> > > STACK: read_pdb.pl:10
> > > -----------------------------------------------------------
> > >
> > > As far I understand the documentation, the method chain of the object
> > > Bio::Structure::Entry requires an as input an object of type Chain.
> > >
> > > Any solution will be very welcome.
> > >
> > > best regards,
> > > Joan
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gowthaman.ramasamy at sbri.org  Mon Dec 14 19:16:32 2009
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 14 Dec 2009 11:16:32 -0800
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
Message-ID: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>


Hi All,
I have a list of GO terms. And would like to pull GO accessions for them.
I can easily do the revere of it using get_term("GO::00000051").

But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".


Thanks very much,
Gowtham


From lsbrath at gmail.com  Mon Dec 14 19:41:39 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Mon, 14 Dec 2009 14:41:39 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>

Hello,

I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
following error message:

Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
/sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
/Library/Perl/5.8.8 /Library/Perl
/Network/Library/Perl/5.8.8/darwin-thread-multi-2level
/Network/Library/Perl/5.8.8 /Network/Library/Perl
/System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
at project_example.pl line 4.
BEGIN failed--compilation aborted at project_example.pl line 4.

I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
Any ideas?

MEB


From scott at scottcain.net  Mon Dec 14 19:47:05 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 14 Dec 2009 14:47:05 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com>

Hi Mgavi,

I think Jason may have already started helping, but the question is:
is SeqIO.pm anywhere in those directories?  If not, why not?  If so,
why can't the perl you are using find it?  Do you have more than one
instance of perl on your machine (fairly likely if you are using a
fink-installed BioPerl)?  When you execute your script, which perl are
you using?

Scott


On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite <lsbrath at gmail.com> wrote:
> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From bosborne11 at verizon.net  Mon Dec 14 19:45:35 2009
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 14 Dec 2009 14:45:35 -0500
Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac
In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com>
Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net>

Mgavi,

So there's a directory called /sw/lib/perl5/Bio? Or is it called  
something else?

Brian O.


On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote:

> Hello,
>
> I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get  
> the
> following error message:
>
> Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5
> /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- 
> multi-2level
> /Library/Perl/5.8.8 /Library/Perl
> /Network/Library/Perl/5.8.8/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.8 /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ 
> 5.8.1 .)
> at project_example.pl line 4.
> BEGIN failed--compilation aborted at project_example.pl line 4.
>
> I moved the BioPerl dir to /sw/lib/perl5 and I still get the error  
> message.
> Any ideas?
>
> MEB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Dec 14 21:42:09 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 14 Dec 2009 13:42:09 -0800
Subject: [Bioperl-l] fasta format
In-Reply-To: <C56E1117A61A4835B8E794D34A157F5B@jonas>
References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas>
	<C04B9F93-3DC1-4743-BDAD-C67E6A5BC3E2@bioperl.org>
	<C56E1117A61A4835B8E794D34A157F5B@jonas>
Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org>

you can read the man page from sean Eddy or use it exactly as I showed  
you
sreformat fasta filename > filename.new

you can also use the 1st example which is a bioperl solution.

-jason
On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote:

> Hi Jason,
> thank you very much for your answer.
> i am sorry to bother u again but i'm afraid i need some help with  
> that because i don't see how to use sreformat?
> i dont get it managed to write a script that works.
>
> thank u again :)
> jonas
>
>
> ----- Original Message ----- From: "Jason Stajich" <jason at bioperl.org>
> To: "Jonas Schaer" <Jonas_Schaer at gmx.de>
> Cc: <bioperl-l at bioperl.org>
> Sent: Tuesday, December 08, 2009 6:44 PM
> Subject: Re: [Bioperl-l] fasta format
>
>
>> you can run
>> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or
>> that is installed when you install the Bioperl scripts)
>> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o  
>> yournewfile.fa
>> # rename it back
>> $ mv yournewfile.fa yourfile.fa
>>
>> or
>> $ sreformat fasta yourfile.fa > yournewfile.fa
>> $ mv yournewfile.fa yourfile.fa
>>
>>
>> -jason
>> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote:
>>
>>> Hi there,
>>> I have a little question concerning bioperl. I have
>>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read
>>> in some fasta files. first it worked fine, but now i have some
>>> fastafiles in slightly different format (not all lines have the same
>>> length!).
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Each line of the fasta entry must be the same length except the
>>> last.
>>>   Line above #49 '
>>> ..' is 28 != 101 chars.
>>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/
>>> Fasta.pm:771
>>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: 
>>> 681
>>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491
>>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513
>>> STACK main::readfasta blast_eval.pm:174
>>> STACK toplevel blast_eval.pm:83
>>> -------------------------------------
>>>
>>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/
>>> site/lib/Bio/
>>> DB/Fasta.pm line 1054.
>>>
>>>
>>> Is there any way to use these fasta files with diffrent length of
>>> lines with this fasta.pm module or will i have to change the format
>>> of my fasta-files(big databases...) ?
>>>
>>> Thanks in advance for any help!
>>>
>>> Regards, Jonas
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>
>
> --------------------------------------------------------------------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date:  
> 12/08/09 07:34:00
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From cjfields at illinois.edu  Tue Dec 15 01:23:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 14 Dec 2009 19:23:05 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>

All,

The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:

1) Stockholm Rfam reverses start and end if the strand == -1
          
   chrY/598-1

2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end

   rice-3(+)/16598648-16600199

The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?

chris


From bernd.web at gmail.com  Tue Dec 15 08:37:44 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 15 Dec 2009 09:37:44 +0100
Subject: [Bioperl-l] GO::Parser / GO::Model::Term
In-Reply-To: <C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com>
	<C74BCF10.9BF6%gowthaman.ramasamy@sbri.org>
Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com>

Dear Gowthaman,

A non-BioPerl solution: the Ontology Lookup service at EBI. It also
provides a web service interface.

http://www.ebi.ac.uk/ontology-lookup/

citrulline metabolic process has to be selected from the pull-down
list in the interactive page. This will return the ID (GO:0000052) and
addional info:

definition	The chemical reactions and pathways involving citrulline,
N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins.
preferred name	citrulline metabolic process
exact synonym	citrulline metabolism
subset	Prokaryotic GO subset
xref_definition	ISBN:209853"Oxford Dictionary of Biochemistry and
Molecular Biology"

The webservice is described at
http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do


Regards,
Bernd


On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy
<gowthaman.ramasamy at sbri.org> wrote:
>
> Hi All,
> I have a list of GO terms. And would like to pull GO accessions for them.
> I can easily do the revere of it using get_term("GO::00000051").
>
> But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process".
>
>
> Thanks very much,
> Gowtham
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From fs5 at sanger.ac.uk  Tue Dec 15 10:38:40 2009
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 15 Dec 2009 10:38:40 +0000
Subject: [Bioperl-l] parse EMBL Feature Table only
In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk>
	<0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se>
Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk>

Thanks Dave,
good to know that I haven't overlooked something bleedingly obvious in
Bioperl that already does this :-)
No problem, I have already implemented a simple parser to do it, which
works fine for my files. 
Thanks
Frank


On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote:
> Hi Frank,
> 
> You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method:
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12
> 
> Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy.
> 
> It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way.
> 
> 
> Dave
> 
> 
> PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO:
> 
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From rmb32 at cornell.edu  Tue Dec 15 15:09:43 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 15 Dec 2009 07:09:43 -0800
Subject: [Bioperl-l] AGI's fpc stuff:  Bio::Map::Physical, Bio::MapIO::fpc,
	etc
Message-ID: <4B27A6B7.6090709@cornell.edu>

Hi all,

Recently I caught an interesting thing related to making GFF files out
of FPC maps built recently using Bio::MapIO;:fpc.  All of the 
coordinates in the resulting GFF3 and the sizes of the contigs and 
clones seem to be dilated by 4x from where they should be.

This didn't happen with some earlier FPC datasets I ran through these 
modules.

I haven't gone through any of this very thoroughly, but I notice in 
Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my 
$basepair = 4096', and the routine goes on to use $basepair as a sort of 
multiplier for converting the native physical map units into basepairs 
for GFF-style output.

This makes me wonder if the newer FPC datasets coming out require a 
different $basepairs value, maybe 1024?  Are the original authors of 
these modules still around on this list?

Rob

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From tristan.lefebure at gmail.com  Tue Dec 15 17:18:26 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 15 Dec 2009 12:18:26 -0500
Subject: [Bioperl-l] ncurses and bioperl?
Message-ID: <200912151218.26357.tristan.lefebure@gmail.com>

Hello,

(Be careful: the following is a very naive question)

Something that I find myself missing is a simple way to look 
at alignments and trees on remote machines where I don't 
have access to X. Since, 
	(1) one can make wonderful terminal programs like screen 
and emacs by using ncurses, 
	(2) that alignment and tree objects are already well 
handled in bioperl, and 
	(3) that there is a CPAN Curses module; 

doing 1+2+3, may I dream of a curse/bioperl perl program to 
render alignment and trees? I suppose a plain C program 
would be much better, but well I am a biologist...

Thanks,

--Tristan


From jason at bioperl.org  Tue Dec 15 17:50:52 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 15 Dec 2009 09:50:52 -0800
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <AEFA51CB-0070-4A1F-9FE3-DA4810129398@bioperl.org>

not to say this isn't a good idea, but currently for curses I would  
use the treeviewing with retree from PHYLIP
and for short read alignments the samtools tview or Gambit (MarthLab)   
works great or something like ralee for viewing MSA alignments (though  
targeted for RNA editing)
  http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ 
  http://dx.doi.org/10.1093/bioinformatics/bth489

Just that there are prior examples so would be able to learn from them  
if you still wanted to roll your own here.

-jason
On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote:

> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org


From roy.chaudhuri at gmail.com  Tue Dec 15 17:47:26 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 15 Dec 2009 17:47:26 +0000
Subject: [Bioperl-l] ncurses and bioperl?
In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com>
References: <200912151218.26357.tristan.lefebure@gmail.com>
Message-ID: <4B27CBAE.5000303@gmail.com>

Hi Tristan,

Not a Bioperl solution, but retree from the Phylip package displays 
trees in a terminal.

Roy.

On 15/12/2009 17:18, Tristan Lefebure wrote:
> Hello,
>
> (Be careful: the following is a very naive question)
>
> Something that I find myself missing is a simple way to look
> at alignments and trees on remote machines where I don't
> have access to X. Since,
> 	(1) one can make wonderful terminal programs like screen
> and emacs by using ncurses,
> 	(2) that alignment and tree objects are already well
> handled in bioperl, and
> 	(3) that there is a CPAN Curses module;
>
> doing 1+2+3, may I dream of a curse/bioperl perl program to
> render alignment and trees? I suppose a plain C program
> would be much better, but well I am a biologist...
>
> Thanks,
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From nml5566 at gmail.com  Tue Dec 15 21:37:30 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 15 Dec 2009 15:37:30 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>

Is the Bio::Ontology::OBOEngine module working or being currently
maintained? I tried following the documentation in the module:

* use Bio::Ontology::OBOEngine;

 my $parser = Bio::Ontology::OBOEngine->new
               ( -file => "gene_ontology.obo" );

 my $engine = $parser->parse();

*But, it throws an error when I run the file saying 'Can't locate object
method "parse" '. Does anyone have any experience getting this module
working; or, is there any alternative bioperl module to extract terms and
relationships out of sequence ontology files?


From hlapp at drycafe.net  Tue Dec 15 22:05:10 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 15 Dec 2009 17:05:10 -0500
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
Message-ID: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>

That shouldn't happen I suppose, but you're not supposed really to use  
the engine directly. Rather it will be used as a backing parser by the  
Bio::OntologyIO parser you choose. Have you tried that route and found  
it not to work?

	-hilmar

On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:

> Is the Bio::Ontology::OBOEngine module working or being currently
> maintained? I tried following the documentation in the module:
>
> * use Bio::Ontology::OBOEngine;
>
> my $parser = Bio::Ontology::OBOEngine->new
>               ( -file => "gene_ontology.obo" );
>
> my $engine = $parser->parse();
>
> *But, it throws an error when I run the file saying 'Can't locate  
> object
> method "parse" '. Does anyone have any experience getting this module
> working; or, is there any alternative bioperl module to extract  
> terms and
> relationships out of sequence ontology files?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Wed Dec 16 09:58:16 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Dec 2009 10:58:16 +0100
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <DB8FB8FF-7DCE-4718-9E17-856F09AE1F46@sbc.su.se>

I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.)

It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse.

Whichever way you go, I think

> a new method that creates this, and deprecate[s] out simple non-stranded NSE

would be great.


Dave


From maj at fortinbras.us  Wed Dec 16 12:51:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 16 Dec 2009 07:51:24 -0500
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>

I'm with Dave; option 1 is cleaner. The only problem might be the automatic 
interpretation of older output as always plus strand, but presumably these would 
have had to record the strandedness explicitly elsewhere, so they would be 
updatable. I'm definitely for making strandedness part of the spec in some way. 
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 14, 2009 8:23 PM
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes


> All,
>
> The current output for NSE format (Name/Start-End) via 
> Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have 
> seen two variations of NSE that incorporate strandedness:
>
> 1) Stockholm Rfam reverses start and end if the strand == -1
>
>   chrY/598-1
>
> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>
>   rice-3(+)/16598648-16600199
>
> The former breaks fewer things within BioPerl, but the latter seems more 
> explicit.  Any preferences?  Do we want a new method that creates this, and 
> deprecate out simple non-stranded NSE?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From tuco at pasteur.fr  Wed Dec 16 14:14:28 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 15:14:28 +0100
Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO
	(Genbank)
Message-ID: <4B28EB44.3080006@pasteur.fr>

Hi,

I've wrote a small Genbank parser few months ago before BioPerl release 
1.6.0.
I tried to use my code once again but now the output of my parser is empty.
It looks like Annotation from seqfeatures is not filled anymore.

Here is the code I used previously:

while(my $seq = $streamer->next_seq()){

     #We only want to retrieve CDS features...
     foreach my $feat (grep { $_->primary_tag() eq 'CDS' } 
$seq->get_SeqFeatures()){
         print $ofh join("#",
                         
$feat->annotation()->get_Annotations('locus_tag'),    # Acc num
                         $feat->annotation()->get_Annotations('gene')
                           ? 
$feat->annotation()->get_Annotations('gene')      # Gene name
                           : 
$feat->annotation()->get_Annotations('locus_tag'),
                         
$feat->annotation()->get_Annotations('product'),      # Description
                        ),"\n";
     }
}

$feat is a Bio::SeqFeature::Generic object

If I print Dumper($feat->annotation()) here is the output :

$VAR1 = bless( {
                  '_typemap' => bless( {
                                         '_type' => {
                                                      'comment' => 
'Bio::Annotation::Comment',
                                                      'reference' => 
'Bio::Annotation::Reference',
                                                      'dblink' => 
'Bio::Annotation::DBLink'
                                                    }
                                       }, 'Bio::Annotation::TypeManager' ),
                  '_annotation' => {}
                }, 'Bio::Annotation::Collection' );

Have some changes been made into the way annotation object is populated?

Thanks for any clue and sorry if my question look stupid

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From cjfields at illinois.edu  Wed Dec 16 15:09:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 16 Dec 2009 09:09:56 -0600
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <4B28EB44.3080006@pasteur.fr>
References: <4B28EB44.3080006@pasteur.fr>
Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>

Emmanuel,

The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):

for my $feat_object ($seq_object->get_SeqFeatures) {
    print "primary tag: ", $feat_object->primary_tag, "\n";
    for my $tag ($feat_object->get_all_tags) {
        print "  tag: ", $tag, "\n";
        for my $value ($feat_object->get_tag_values($tag)) {
            print "    value: ", $value, "\n";
        }   
    }
}

You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.

chris

On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:

> Hi,
> 
> I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0.
> I tried to use my code once again but now the output of my parser is empty.
> It looks like Annotation from seqfeatures is not filled anymore.
> 
> Here is the code I used previously:
> 
> while(my $seq = $streamer->next_seq()){
> 
>    #We only want to retrieve CDS features...
>    foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){
>        print $ofh join("#",
>                        $feat->annotation()->get_Annotations('locus_tag'),    # Acc num
>                        $feat->annotation()->get_Annotations('gene')
>                          ? $feat->annotation()->get_Annotations('gene')      # Gene name
>                          : $feat->annotation()->get_Annotations('locus_tag'),
>                        $feat->annotation()->get_Annotations('product'),      # Description
>                       ),"\n";
>    }
> }
> 
> $feat is a Bio::SeqFeature::Generic object
> 
> If I print Dumper($feat->annotation()) here is the output :
> 
> $VAR1 = bless( {
>                 '_typemap' => bless( {
>                                        '_type' => {
>                                                     'comment' => 'Bio::Annotation::Comment',
>                                                     'reference' => 'Bio::Annotation::Reference',
>                                                     'dblink' => 'Bio::Annotation::DBLink'
>                                                   }
>                                      }, 'Bio::Annotation::TypeManager' ),
>                 '_annotation' => {}
>               }, 'Bio::Annotation::Collection' );
> 
> Have some changes been made into the way annotation object is populated?
> 
> Thanks for any clue and sorry if my question look stupid
> 
> Regards
> 
> Emmanuel
> 
> -- 
> -------------------------
> Emmanuel Quevillon
> Biological Software and Databases Group
> Institut Pasteur
> +33 1 44 38 95 98
> tuco at_ pasteur dot fr
> -------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tuco at pasteur.fr  Wed Dec 16 15:37:45 2009
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 16 Dec 2009 16:37:45 +0100
Subject: [Bioperl-l] Data missing into Annotation object
 using	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <4B28FEC9.1080509@pasteur.fr>

On 12/16/2009 04:09 PM, Chris Fields wrote:
> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation.  The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default.  You can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>      print "primary tag: ", $feat_object->primary_tag, "\n";
>      for my $tag ($feat_object->get_all_tags) {
>          print "  tag: ", $tag, "\n";
>          for my $value ($feat_object->get_tag_values($tag)) {
>              print "    value: ", $value, "\n";
>          }
>      }
> }
>
> You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
>    
Hi Chris

Thanks for the infos.
I indeed revert back to using $feat->get_tag_values() and it works as 
previously.
For my small problem I can keep this solution which far adapted for my 
problem.

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------


From sung at bio.cc  Wed Dec 16 17:55:16 2009
From: sung at bio.cc (Sungsam Gong)
Date: Wed, 16 Dec 2009 17:55:16 +0000
Subject: [Bioperl-l] pdb.pm and annotations
Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>

Hi,

Wanted to get pubmed identifier from a PDB file using Bio::Structure,
so hacked the code.
Knew that Bio::Structure::IO::pdb.pm get relevant info from either
'JRNL' or 'REMARK 1'.
However could not see any actual code parsing 'PMID'.

>From pdb.pm, what I see:

sub _read_PDB_jrnl {
...
           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
...
}

sub _read_PDB_remark_1 {
...
               $auth = $self->_concatenate_lines($auth,$rol) if
($subr eq "AUTH");
               $titl = $self->_concatenate_lines($titl,$rol) if
($subr eq "TITL");
               $edit = $self->_concatenate_lines($edit,$rol) if
($subr eq "EDIT");
               $ref  = $self->_concatenate_lines($ref ,$rol) if
($subr eq "REF");
               $publ = $self->_concatenate_lines($publ,$rol) if
($subr eq "PUBL");
               $refn = $self->_concatenate_lines($refn,$rol) if
($subr eq "REFN");
...
}

>From my script, I did:

($struc->annotation->get_Annotations('reference'))[0]->authors
($struc->annotation->get_Annotations('reference'))[0]->title

or

my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
for my $key (keys %{$hash_ref}) {
   print $key,": ",$hash_ref->{$key},"\n";
}

Any plan to include a code chopping 'PMID' out?
Or did I miss something?

Cheers,
Sung


From nml5566 at gmail.com  Wed Dec 16 19:42:57 2009
From: nml5566 at gmail.com (Nathan Liles)
Date: Wed, 16 Dec 2009 13:42:57 -0600
Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files?
In-Reply-To: <F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com>
	<F57C4A58-8945-4162-B442-51CA72BA3810@drycafe.net>
Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com>

Actually, yes I did find that and it works very well. Now I'm wondering, is
it possible to search for similar terms using a string instead of a
Bio::Ontology term object? For examle, I'd like to search for the synonym:
"transcription start site" and have it return all similar terms. But, it
throws an error if I pass in a simple query like that.

-Nathan

On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp <hlapp at drycafe.net> wrote:

> That shouldn't happen I suppose, but you're not supposed really to use the
> engine directly. Rather it will be used as a backing parser by the
> Bio::OntologyIO parser you choose. Have you tried that route and found it
> not to work?
>
>        -hilmar
>
>
> On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote:
>
>  Is the Bio::Ontology::OBOEngine module working or being currently
>> maintained? I tried following the documentation in the module:
>>
>> * use Bio::Ontology::OBOEngine;
>>
>> my $parser = Bio::Ontology::OBOEngine->new
>>              ( -file => "gene_ontology.obo" );
>>
>> my $engine = $parser->parse();
>>
>> *But, it throws an error when I run the file saying 'Can't locate object
>> method "parse" '. Does anyone have any experience getting this module
>> working; or, is there any alternative bioperl module to extract terms and
>> relationships out of sequence ontology files?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
>


From cjfields1 at gmail.com  Thu Dec 17 00:53:50 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
Message-ID: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>

Howdy from Google Groups


From cjfields1 at gmail.com  Thu Dec 17 01:01:38 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>

I would like to announce (with the tremendous help of Hilmar Lapp) the
creation of a mirror for the BioPerl mail list, if the last post
didn't already give it away.

http://groups.google.com/group/bioperl-l

One can join the group and submit posts via the Google Groups web
interface or via email.  Have fun!

chris


From ocarnorsk138 at gmail.com  Thu Dec 17 01:12:21 2009
From: ocarnorsk138 at gmail.com (Ocar Campos)
Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST)
Subject: [Bioperl-l] Test post from Google Groups
In-Reply-To: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
References: <bf909fab-ea92-40b6-8092-07292989766e@c3g2000yqd.googlegroups.com>
Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com>

testing back from google group!

On Dec 16, 9:53?pm, Chris Fields <cjfiel... at gmail.com> wrote:
> Howdy from Google Groups
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Thu Dec 17 10:50:23 2009
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST)
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
Message-ID: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>

On Dec 17, 1:01?am, Chris Fields <cjfiel... at gmail.com> wrote:
> I would like to announce (with the tremendous help of Hilmar Lapp) the
> creation of a mirror for the BioPerl mail list, if the last post
> didn't already give it away.
>
> http://groups.google.com/group/bioperl-l
>
> One can join the group and submit posts via the Google Groups web
> interface or via email. ?Have fun!
>
> chris

Sounds particularly good in the long run (once there is enough of
an archive on Google Groups to make searching there useful).

Does this mean a Google Groups user doesn't have to be subscribed
to the mailing list to post (since the mailing list normally only
allows subscribers to post)?

Peter


From David.Messina at sbc.su.se  Thu Dec 17 12:25:49 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 17 Dec 2009 13:25:49 +0100
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>

Very nice, Chris and Hilmar! That'll be great.


> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post (since the mailing list normally only
> allows subscribers to post)?


I think that's right. From the Google groups page:

> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.


Dave


From cjfields at illinois.edu  Thu Dec 17 13:21:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 07:21:46 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se>
Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu>


On Dec 17, 2009, at 6:25 AM, Dave Messina wrote:

> Very nice, Chris and Hilmar! That'll be great.
> 
> 
> 
>> Does this mean a Google Groups user doesn't have to be subscribed
>> to the mailing list to post (since the mailing list normally only
>> allows subscribers to post)?
> 
> 
> I think that's right. From the Google groups page:
> 
>> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively.
> 
> 
> 
> 
> Dave

It is moderated by user to deal with spam.  Hilmar's already a manager/co-owner, and either of us can add more as needed.

chris


From hlapp at drycafe.net  Thu Dec 17 14:52:33 2009
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 17 Dec 2009 09:52:33 -0500
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>


On Dec 17, 2009, at 5:50 AM, Peter wrote:

> Does this mean a Google Groups user doesn't have to be subscribed
> to the mailing list to post


Yes. They can post through the Google Groups web interface.

The email address for mirrored groups is the one of the list being  
mirrored though, bioperl-l at bioperl.org in this case, and so in order  
to post by email you still have to be subscribed at the bioperl-l  
list. At least that's what the docs at Google say.

I haven't tried yet posting to the group at the bioperl-l at  
googlegroups dot com email under an email address that isn't  
subscribed to bioperl-l at bioperl dot org. Maybe it actually would  
work, contrary to docs.

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From jay at jays.net  Thu Dec 17 17:05:24 2009
From: jay at jays.net (Jay Hannah)
Date: Thu, 17 Dec 2009 11:05:24 -0600
Subject: [Bioperl-l] bioperl-l Google Groups mirror
In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com>
	<a5bac50c-a3c6-4f6f-a66c-1ff13cc3220b@26g2000yqo.googlegroups.com>
	<56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net>
Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net>

On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote:
> I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs.

In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. 

Here is the configuration set I recommend:

   http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png

Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so.

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From robert.bradbury at gmail.com  Thu Dec 17 19:42:54 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 14:42:54 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
Message-ID: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>

Just to close out the issue of bioperl forking (in particular accesses to
external databases through get_sequence) which involves individual database
sub-modules and not collecting its children.

As it turns out the code does do an explicit fork, it looks like so the
child process can read from the database while the parent process
manipulates the data as it becomes available.  Now, one could argue that a
threaded model might be better since now threads are fairly standard OS
tools in current environments.

But I couldn't find any functions which actually wait for the forked process
(presumably because they are created for "future" use).  But nor is there
any indication in the pages I've found in most of the documentation (which
is spread across the web) or Wiki that explain that "creating child
processes" is how these functions work and one *needs* to collect those
children after each use or else zombie processes will accumulate, which on
"reasonable" systems with per-user process limits will create problems for
proper program functioning.  Nor (it would appear) does the parent process
setup a SIGCHLD "catcher" which could collect the processes once they exit
(which I expect in the case of "get_sequence" would be after closing of the
socket which actually fetched the sequence from Genbank.

It can be resolved easily enough by adding a call after each use of these
functions:
   $kid = waitpid(-1, WNOHANG);
But typically, as a programmer, I should not be responsible for having to
clean up the leftovers of library calls (unless said cleanup requirements
are clearly documented).


But to a "newbie" using the functions, coming from a functional background
(C), not an OO background (which at least I would tend to view as a wart on
the otherwise robust Perl language), there are two problems
1. The lack of documentation and examples explaining how the functions work
and how they must be handled at a higher level (by executing explicit wait
system calls).
2. The lack of code in the BioPerl functions to deal with the forked
processes which they create.  Functional programmers have a perspective --
if you create it -- you have to clean it up.  It would appear that in the
transition to OO programming (or perhaps simply for expediency) that detail
was left out of both (either/and) the documentation and the code.  From this
standpoint one could view garbage collectors as being fundamentally evil --
because they gloss over the fact that programmers should know what they are
doing and when they are doing it.

So, everywhere in the documentation where there is a get_sequence call (or
anything which accesses an external database which causes a fork to occur)
there should be a modification as I have outlined above -- or else the code
should be corrected so orphaned children are always collected and not
allowed to accumulate.


From robert.bradbury at gmail.com  Thu Dec 17 20:23:38 2009
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Thu, 17 Dec 2009 15:23:38 -0500
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>

Oh, yes, in case it was not clear, the fork calls which fails is in
DB/WebDBSeqI.pm: line 722
     defined(my $pid = fork)
          or $self->throw("'Couldn't fork: $!");

And of course that is because Linux has reached the process limits for the
user (due to accumulated background processes which are uncollected).

And they could be resolved by simply executing a simple waitpid call for
prior orphaned children before forking [1] But such a succinct solution
would violate "functional" programming rules -- clean up what you create --
instead they would tend to fall into the OO camp -- "Oh don't worry the
garbage collector will take care of it".  Green programming is a little less
cavalier.

Robert

1. IMO, a very very real problem with programming today is that there is no
connection between programmers and the cost of their programs.  How many
programmers know the instruction cycle time of their computers, what does an
instruction cost in terms of W consumed, W wasted (heat generation),
fruitless scanning over uncollected zombie processes, etc.  It may be that
only that programmers who grew up in the era when CPU cycles were expensive
(300 ns/cycle) who know what each instruction required in terms of cycles
consider these perspectives.  Now things (cpu use, processor use, etc) tend
to be swept under the rug and it appears that that is the case with the
standard implementation of bioper.  The documentation does not clearly state
that additional sub-processes may be created and need to be collected.  You
are providing a utility that only works "this much".  And guess what -- I
happen to have run into the "this".


From cjfields at illinois.edu  Thu Dec 17 20:25:56 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:25:56 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
Message-ID: <BFDD2A52-FB3D-4CC4-A5BF-C53A3DAC9C41@illinois.edu>

Robert,

I have previously outlined specifically why you are seeing the fork issue, and a possible solution.  IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast.  Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank.  Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime.

Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X.  We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution.

My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'.  What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so.  The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect).  Please keep that in mind.

chris

On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote:

> Just to close out the issue of bioperl forking (in particular accesses to
> external databases through get_sequence) which involves individual database
> sub-modules and not collecting its children.
> 
> As it turns out the code does do an explicit fork, it looks like so the
> child process can read from the database while the parent process
> manipulates the data as it becomes available.  Now, one could argue that a
> threaded model might be better since now threads are fairly standard OS
> tools in current environments.
> 
> But I couldn't find any functions which actually wait for the forked process
> (presumably because they are created for "future" use).  But nor is there
> any indication in the pages I've found in most of the documentation (which
> is spread across the web) or Wiki that explain that "creating child
> processes" is how these functions work and one *needs* to collect those
> children after each use or else zombie processes will accumulate, which on
> "reasonable" systems with per-user process limits will create problems for
> proper program functioning.  Nor (it would appear) does the parent process
> setup a SIGCHLD "catcher" which could collect the processes once they exit
> (which I expect in the case of "get_sequence" would be after closing of the
> socket which actually fetched the sequence from Genbank.
> 
> It can be resolved easily enough by adding a call after each use of these
> functions:
>   $kid = waitpid(-1, WNOHANG);
> But typically, as a programmer, I should not be responsible for having to
> clean up the leftovers of library calls (unless said cleanup requirements
> are clearly documented).
> 
> 
> But to a "newbie" using the functions, coming from a functional background
> (C), not an OO background (which at least I would tend to view as a wart on
> the otherwise robust Perl language), there are two problems
> 1. The lack of documentation and examples explaining how the functions work
> and how they must be handled at a higher level (by executing explicit wait
> system calls).
> 2. The lack of code in the BioPerl functions to deal with the forked
> processes which they create.  Functional programmers have a perspective --
> if you create it -- you have to clean it up.  It would appear that in the
> transition to OO programming (or perhaps simply for expediency) that detail
> was left out of both (either/and) the documentation and the code.  From this
> standpoint one could view garbage collectors as being fundamentally evil --
> because they gloss over the fact that programmers should know what they are
> doing and when they are doing it.
> 
> So, everywhere in the documentation where there is a get_sequence call (or
> anything which accesses an external database which causes a fork to occur)
> there should be a modification as I have outlined above -- or else the code
> should be corrected so orphaned children are always collected and not
> allowed to accumulate.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Dec 17 20:29:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 17 Dec 2009 14:29:10 -0600
Subject: [Bioperl-l] Remote blast fork errors / Process limit
	restrictions
In-Reply-To: <deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
References: <deaa866a0912071241k6dd833bft5c063b0ec192279d@mail.gmail.com>
	<39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org>
	<deaa866a0912171142ja0fd0do91835372b04a856@mail.gmail.com>
	<deaa866a0912171223v4a5db828ge6b68b889f2570e@mail.gmail.com>
Message-ID: <FF6F8AAD-FBBE-4FAD-BB88-59A779CC7131@illinois.edu>

On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote:

> Oh, yes, in case it was not clear, the fork calls which fails is in
> DB/WebDBSeqI.pm: line 722
>     defined(my $pid = fork)
>          or $self->throw("'Couldn't fork: $!");

Okay, that's a bit more helpful.

> And of course that is because Linux has reached the process limits for the
> user (due to accumulated background processes which are uncollected).

Right, but again, we need to check this in a cross-platform compatible way.

> And they could be resolved by simply executing a simple waitpid call for
> prior orphaned children before forking [1] But such a succinct solution
> would violate "functional" programming rules -- clean up what you create --
> instead they would tend to fall into the OO camp -- "Oh don't worry the
> garbage collector will take care of it".  Green programming is a little less
> cavalier.
> 
> Robert
> 
> 1. IMO, a very very real problem with programming today is that there is no
> connection between programmers and the cost of their programs.  How many
> programmers know the instruction cycle time of their computers, what does an
> instruction cost in terms of W consumed, W wasted (heat generation),
> fruitless scanning over uncollected zombie processes, etc.  It may be that
> only that programmers who grew up in the era when CPU cycles were expensive
> (300 ns/cycle) who know what each instruction required in terms of cycles
> consider these perspectives.  Now things (cpu use, processor use, etc) tend
> to be swept under the rug and it appears that that is the case with the
> standard implementation of bioper.  The documentation does not clearly state
> that additional sub-processes may be created and need to be collected.  You
> are providing a utility that only works "this much".  And guess what -- I
> happen to have run into the "this".

Um, yeah.  Okay.

chris


From robfsouza at gmail.com  Fri Dec 18 18:07:34 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Fri, 18 Dec 2009 13:07:34 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
Message-ID: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>

Hi,

I've been dealing with an apparent bug in the output of NCBI's BLAST
programs (blastall, blastpgp) which sometimes produces output like the
one below.
I think I've managed to produce a work around for Bioperl blast.pm
parser and would like to contribute it to Bioperl.
The fix is based on blast.pm from the CVS tree (downloaded some months
ago...) and is attached to this message.
Best,
Robson

PS: what happened to the bioperl-bugs mailing list? It does not seem
to be working...

>gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
? ? ? ? ? hypothetical protein [Nasonia vitripennis]
? ? ? ? ?Length = 1774

?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust.
?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)

Query: 0 ? -

Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 0

Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328

Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654
? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? +
Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376

Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++
Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432

Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF
Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491

Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++
Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548

Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L
Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602

Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E
Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661

Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E
Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blast_patched.pm
Type: application/octet-stream
Size: 91820 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091218/3771d91c/attachment-0004.obj>

From cjfields at illinois.edu  Fri Dec 18 18:33:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 18 Dec 2009 12:33:44 -0600
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <DC79216C-9DD8-47AE-876F-7BBAEC6C43CB@illinois.edu>

Robson, 

Any chance you could check this against SVN?  We haven't used the CVS tree for a few years (had a number of releases along the way as well).

Not sure about bioperl-bugs, we have bugzilla still running though:

http://bugzilla.open-bio.org/

chris


On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote:

> Hi,
> 
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson
> 
> PS: what happened to the bioperl-bugs mailing list? It does not seem
> to be working...
> 
>> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved
>           hypothetical protein [Nasonia vitripennis]
>          Length = 1774
> 
>  Score = 75.9 bits (185), Expect = 1e-11,   Method: Compositional matrix adjust.
>  Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%)
> 
> Query: 0   -
> 
> Sbjct: 328 P                                                            328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 0
> 
> Sbjct: 328                                                              328
> 
> Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG             654
>            P PP +   + P       KTK+      K+P  K         +
> Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA             376
> 
> Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713
>           ++  N  +    W  +     +++  +   N    NN       D   +E    PT ++
> Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432
> 
> Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773
>           LD K S  + + L   + +  +I + + D    ++  + +  L  + PE D+ + ++SF
> Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491
> 
> Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833
>              DG   +L   +K F  +  +P  K R      +  F  ++  +EP I S+  A +++
> Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548
> 
> Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893
>           +  KSLQ ++ ++++  NFLN      +   G KL+ L KL +I++    N+  MN L
> Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602
> 
> Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951
>           ++  + ++   +LL   +  +  +  ++  + +L  E   L+  +K I+++++    E
> Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661
> 
> Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995
>                  +Q+ +F Q A+ EM ++ +  E+L+ + + +A+FF E
> Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701
> <blast_patched.pm>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Fri Dec 18 23:00:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 18 Dec 2009 23:00:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>

On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
> Hi,
>
> I've been dealing with an apparent bug in the output of NCBI's BLAST
> programs (blastall, blastpgp) which sometimes produces output like the
> one below.
> I think I've managed to produce a work around for Bioperl blast.pm
> parser and would like to contribute it to Bioperl.
> The fix is based on blast.pm from the CVS tree (downloaded some months
> ago...) and is attached to this message.
> Best,
> Robson

Do you have a complete example of this kind of funny output?
This problem has also been reported with blastpgp for the
Biopython parser. I'd love an example for our unit tests
(probably worth doing in BioPerl too). Could you upload a
test case here?:

http://bugzilla.open-bio.org/show_bug.cgi?id=2927

Thanks!

Peter @ Biopython


From biopython at maubp.freeserve.co.uk  Sat Dec 19 11:19:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 19 Dec 2009 11:19:53 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Thank you,

Peter


From maj at fortinbras.us  Sat Dec 19 19:52:45 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 19 Dec 2009 14:52:45 -0500
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
Message-ID: <F7E9AD08646A44D3AB29A4504A725095@NewLife>

Hi All, 

Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
is at beta in the bioperl-run trunk. It wraps all the programs of the 
NCBI's new blast+-2.2.22 suite 
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
and integrates them, allowing you to create, mask, and query 
databases from within a single factory object. See the HOWTO
http://www.bioperl.org/wiki/HOWTO:BlastPlus
for the usual usage and implementation details.

Happy coding--
MAJ 


From David.Messina at sbc.su.se  Sat Dec 19 20:34:10 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 19 Dec 2009 21:34:10 +0100
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se>

Sweet! Thanks, Mark.


Dave


From cjfields at illinois.edu  Sat Dec 19 22:44:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 16:44:46 -0600
Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment
In-Reply-To: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
References: <F7E9AD08646A44D3AB29A4504A725095@NewLife>
Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu>

Very nice!  We'll definitely give it a try here (along with the requisite feedback, of course).

chris

On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote:

> Hi All, 
> 
> Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus,
> is at beta in the bioperl-run trunk. It wraps all the programs of the 
> NCBI's new blast+-2.2.22 suite 
> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
> and integrates them, allowing you to create, mask, and query 
> databases from within a single factory object. See the HOWTO
> http://www.bioperl.org/wiki/HOWTO:BlastPlus
> for the usual usage and implementation details.
> 
> Happy coding--
> MAJ 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Sun Dec 20 04:59:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 19 Dec 2009 22:59:38 -0600
Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife>
References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu>
	<6723123C0ABD447190639AE1F5D1A6A7@NewLife>
Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu>

I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1).  

I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else).

chris

On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote:

> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Monday, December 14, 2009 8:23 PM
> Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes
> 
> 
>> All,
>> 
>> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness.  I have seen two variations of NSE that incorporate strandedness:
>> 
>> 1) Stockholm Rfam reverses start and end if the strand == -1
>> 
>>  chrY/598-1
>> 
>> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end
>> 
>>  rice-3(+)/16598648-16600199
>> 
>> The former breaks fewer things within BioPerl, but the latter seems more explicit.  Any preferences?  Do we want a new method that creates this, and deprecate out simple non-stranded NSE?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.osimo at gmail.com  Sun Dec 20 18:19:37 2009
From: e.osimo at gmail.com (Emanuele Osimo)
Date: Sun, 20 Dec 2009 19:19:37 +0100
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>

Hello everyone,
I have a very particular problem: I'd like to draw in a single track
different SNPs with a glyph that allows me to see graphically their
importance.
For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
first depicted small, and the last one big, with the ones in between with
according sizes.
I'd be satisfied also with a color gradient.
What I cannot do is to set the option -height , for example, instead than in
the add_track section, in the Bio::SeqFeature::Generic->new that I use for
each of my objects.
If I set it in the add_track section, all the glyphs are then of the same
size (or color).
If, otherwise, I add a different track for each object, my picture becomes
too big.

Please, help!
Thanks
Emanuele


From ajmackey at gmail.com  Sun Dec 20 18:41:14 2009
From: ajmackey at gmail.com (Aaron Mackey)
Date: Sun, 20 Dec 2009 13:41:14 -0500
Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes
In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com>
Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com>

You can set the height as a callback sub, rather than a constant -- the
callback will get passed the feature about to be drawn, from which you can
calculate the "importance", and return the desired height, dynamically.

-Aaron

On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo <e.osimo at gmail.com> wrote:

> Hello everyone,
> I have a very particular problem: I'd like to draw in a single track
> different SNPs with a glyph that allows me to see graphically their
> importance.
> For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the
> first depicted small, and the last one big, with the ones in between with
> according sizes.
> I'd be satisfied also with a color gradient.
> What I cannot do is to set the option -height , for example, instead than
> in
> the add_track section, in the Bio::SeqFeature::Generic->new that I use for
> each of my objects.
> If I set it in the add_track section, all the glyphs are then of the same
> size (or color).
> If, otherwise, I add a different track for each object, my picture becomes
> too big.
>
> Please, help!
> Thanks
> Emanuele
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From robfsouza at gmail.com  Sat Dec 19 11:06:16 2009
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 19 Dec 2009 06:06:16 -0500
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
Message-ID: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>

Hi Peter,

I just upload my example. I also reported this bug to the NCBI
developers and I hope they can fix it, since it is easy to reproduce.
I just forgot to mention the blastpgp version: 2.2.18
Best,
Robson

On Fri, Dec 18, 2009 at 6:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza
> <robfsouza at gmail.com> wrote:
>> Hi,
>>
>> I've been dealing with an apparent bug in the output of NCBI's BLAST
>> programs (blastall, blastpgp) which sometimes produces output like the
>> one below.
>> I think I've managed to produce a work around for Bioperl blast.pm
>> parser and would like to contribute it to Bioperl.
>> The fix is based on blast.pm from the CVS tree (downloaded some months
>> ago...) and is attached to this message.
>> Best,
>> Robson
>
> Do you have a complete example of this kind of funny output?
> This problem has also been reported with blastpgp for the
> Biopython parser. I'd love an example for our unit tests
> (probably worth doing in BioPerl too). Could you upload a
> test case here?:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2927
>
> Thanks!
>
> Peter @ Biopython
>


From biopython at maubp.freeserve.co.uk  Mon Dec 21 15:27:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 15:27:47 +0000
Subject: [Bioperl-l] Fwd: blast.pm patch
In-Reply-To: <af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
References: <af6a4f100912181006p4c3089aco6af70ed659724535@mail.gmail.com>
	<af6a4f100912181007w7507699qa1614ba522f63500@mail.gmail.com>
	<320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com>
	<af6a4f100912190306n38e5e1acrc03412835982ecc2@mail.gmail.com>
Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com>

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
>
> Hi Peter,
>
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Hi again Robson,

Having a reproducible example to investigate this issue is
incredibly helpful - thank you!

I've been looking at the output, and while I can make sense of
it "by hand", it would be very tricky to try and parse as a special
case. It really does look like a bug in BLAST to me. The alignment
includes an initial pair, a leading gap in the query (with a coordinate
of zero), plus a residue from the match sequence (with a sensible
coordinate). The alignment statistics include this (extra) pair in
the alignment length.

You said you were using blastpgp version 2.2.18, so I tried this
with the latest (final?) version of the "legacy" BLAST suite,
blastpgp 2.2.22, which I already had installed. It looks like my
copy of NR is more recent (bigger), but the same odd output
was produced:

blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000

I also tried what I think would be the equivalent command line
on the new BLAST+ suite, using psiblast 2.2.22+ like this:

psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast
-num_threads 8 -parse_deflines -num_alignments 10000

This was much faster, and seems to output sensible alignments.

I might therefore expect the NCBI so say "yes, this is a bug in
the old blastpgp tool, just use the new psiblast tool instead".
However,  fingers crossed they will do another maintenance
release of the "legacy" BLAST suite and fix this in blastpgp.

Have you had any reply from the NCBI? Admittedly it is almost
Christmas/New Year so we may not expect an answer until Jan.

Peter


From maj at fortinbras.us  Mon Dec 21 18:52:01 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 13:52:01 -0500
Subject: [Bioperl-l] test fail
Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife>

fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)

t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
#          got: '1..4'
#     expected: 'complement(5..8)'

t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
#          got: 'complement(5..8)'
#     expected: '1..4'
# Looks like you failed 2 tests of 51.

MAJ


From cjfields at illinois.edu  Mon Dec 21 19:20:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 13:20:32 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
Message-ID: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>

Saw that from the other day (LocatableSeq commit).  I'll check it out.

chris

On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:

> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
> 
> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
> #          got: '1..4'
> #     expected: 'complement(5..8)'
> 
> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
> #          got: 'complement(5..8)'
> #     expected: '1..4'
> # Looks like you failed 2 tests of 51.
> 
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scott at scottcain.net  Mon Dec 21 20:02:20 2009
From: scott at scottcain.net (Scott Cain)
Date: Mon, 21 Dec 2009 15:02:20 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>

Hi All,

Today it was pointed out to me that the Bio::Graphics documentation
links on the BioPerl wiki are broken, no doubt because Bio::Graphics
is no longer part of bioperl-core (is that how it should be referred
to?).  Anyway, the question is: what is the right way to rectify this
problem?  Since other things may get broken out in the future, I
suppose we should get some sort of standard established.  Can a
release of Bio::Graphics be placed somewhere on the BioPerl wiki
server to be processed?

Thanks,
Scott


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Mon Dec 21 20:22:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 14:22:39 -0600
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>

We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links.  Shouldn't be too hard to do.

chris

On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:

> Hi All,
> 
> Today it was pointed out to me that the Bio::Graphics documentation
> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
> is no longer part of bioperl-core (is that how it should be referred
> to?).  Anyway, the question is: what is the right way to rectify this
> problem?  Since other things may get broken out in the future, I
> suppose we should get some sort of standard established.  Can a
> release of Bio::Graphics be placed somewhere on the BioPerl wiki
> server to be processed?
> 
> Thanks,
> Scott
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Dec 21 21:12:45 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 21 Dec 2009 15:12:45 -0600
Subject: [Bioperl-l] test fail
In-Reply-To: <E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
References: <5614E9FF133A47A694EF892D38A1717A@NewLife>
	<E44B0982-64FB-47EC-AF87-48305997D7AE@illinois.edu>
Message-ID: <A396F39A-76BC-44B4-8302-4C622257E6ED@illinois.edu>

T'was a bad test call.  I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly.

chris

On Dec 21, 2009, at 1:20 PM, Chris Fields wrote:

> Saw that from the other day (LocatableSeq commit).  I'll check it out.
> 
> chris
> 
> On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote:
> 
>> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64)
>> 
>> t/SeqTools/SeqUtils..........................NOK 46/51#   Failed test at t/SeqTools/SeqUtils.t line 275.
>> #          got: '1..4'
>> #     expected: 'complement(5..8)'
>> 
>> t/SeqTools/SeqUtils..........................NOK 47/51#   Failed test at t/SeqTools/SeqUtils.t line 276.
>> #          got: 'complement(5..8)'
>> #     expected: '1..4'
>> # Looks like you failed 2 tests of 51.
>> 
>> MAJ
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Dec 21 21:27:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:27:25 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife>

I've modified Template:Doclink ; if you now do

{{Doclink|Bio::Graphics|cpan}}

you'll get a page with only the cpan link.

{{Doclink|Bio::SeqIO}}

etc. works as usual.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Dec 21 21:34:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 16:34:40 -0500
Subject: [Bioperl-l] Bio::Graphics documentation
In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com>
	<6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu>
Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife>

Also, applied the new Doclink to Bio::Graphics on wiki.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Cain" <scott at scottcain.net>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 3:22 PM
Subject: Re: [Bioperl-l] Bio::Graphics documentation


> We can come up with some standard wiki template for those modules no longer in 
> svn, maybe with just CPAN links.  Shouldn't be too hard to do.
>
> chris
>
> On Dec 21, 2009, at 2:02 PM, Scott Cain wrote:
>
>> Hi All,
>>
>> Today it was pointed out to me that the Bio::Graphics documentation
>> links on the BioPerl wiki are broken, no doubt because Bio::Graphics
>> is no longer part of bioperl-core (is that how it should be referred
>> to?).  Anyway, the question is: what is the right way to rectify this
>> problem?  Since other things may get broken out in the future, I
>> suppose we should get some sort of standard established.  Can a
>> release of Bio::Graphics be placed somewhere on the BioPerl wiki
>> server to be processed?
>>
>> Thanks,
>> Scott
>>
>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot 
>> net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Tue Dec 22 02:51:32 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 21 Dec 2009 21:51:32 -0500
Subject: [Bioperl-l] pdb.pm and annotations
In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com>
Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife>

Hi Sung--

We didn't plan it, but we added it anyway: see revision 16559 of 
bioperl-live/trunk.
You can then do
$pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed;
and even
$doi = ($struct->annotation->get_Annotations('reference'))[0]->doi;

Thanks for the heads-up!
cheers,
MAJ
----- Original Message ----- 
From: "Sungsam Gong" <sung at bio.cc>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 16, 2009 12:55 PM
Subject: [Bioperl-l] pdb.pm and annotations


> Hi,
>
> Wanted to get pubmed identifier from a PDB file using Bio::Structure,
> so hacked the code.
> Knew that Bio::Structure::IO::pdb.pm get relevant info from either
> 'JRNL' or 'REMARK 1'.
> However could not see any actual code parsing 'PMID'.
>
>>From pdb.pm, what I see:
>
> sub _read_PDB_jrnl {
> ...
>           $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH");
>           $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL");
>           $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT");
>           $ref  = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF");
>           $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL");
>           $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN");
> ...
> }
>
> sub _read_PDB_remark_1 {
> ...
>               $auth = $self->_concatenate_lines($auth,$rol) if
> ($subr eq "AUTH");
>               $titl = $self->_concatenate_lines($titl,$rol) if
> ($subr eq "TITL");
>               $edit = $self->_concatenate_lines($edit,$rol) if
> ($subr eq "EDIT");
>               $ref  = $self->_concatenate_lines($ref ,$rol) if
> ($subr eq "REF");
>               $publ = $self->_concatenate_lines($publ,$rol) if
> ($subr eq "PUBL");
>               $refn = $self->_concatenate_lines($refn,$rol) if
> ($subr eq "REFN");
> ...
> }
>
>>From my script, I did:
>
> ($struc->annotation->get_Annotations('reference'))[0]->authors
> ($struc->annotation->get_Annotations('reference'))[0]->title
>
> or
>
> my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree
> for my $key (keys %{$hash_ref}) {
>   print $key,": ",$hash_ref->{$key},"\n";
> }
>
> Any plan to include a code chopping 'PMID' out?
> Or did I miss something?
>
> Cheers,
> Sung
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From dan.kortschak at adelaide.edu.au  Tue Dec 22 03:24:04 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 13:54:04 +1030
Subject: [Bioperl-l] call for help and comments on module
Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>

Hi,

I've been working on a Bio::Tools::Run module to handle the bowtie rapid
alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
bioperl-run tree).

I have 90% of what I want included in the module and would like some
advice from more experienced bioperlers. Feedback on approach is also
welcomed (this is my first significant wrapper, and after a long gap
from writing module, so I am rusty). The module has ended up being
significantly more complicated than I had hoped.

There are a few issues I'm having, so I apologise for the list:

     1. Informal tests run correctly (outside the t/ tree and Test
        harness), but formal Test harness tests fail for reasons I
        cannot understand. (The module is still lacking a lot of tests,
        but since things were failing in the harness I have placed them
        as a lower priority and have been working to my micro-script
        tests - yes, bad form.
     2. I am having a big problem with IPC::Run for one of the
        executables (the module can call 5 different excutables for 7
        commands), bowtie-maptool (module command 'map'). All the other
        commands tested (this excludes bowtie-maqconvert [convert
        command]) work fine, but maptool fails with an illegal seek -
        presumably due to the redirection handling? I have no idea how
        to resolve this, so help would be greatly appreciated (a small
        script that demonstrates the use that results in the failure is
        below).

There will be provision for returning a Bio::Assembly::IO object through
samtools in the finished module, but currently the
Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.

Thanks for any help.
Dan


#!/usr/bin/perl

use strict;
use warnings;

use Bio::Tools::Run::Bowtie;

# These files are in the bioperl-run t/data/ tree
my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';

my $bowtiefac = Bio::Tools::Run::Bowtie->new(
	-command             => 'single',
	-max_seed_mismatches => 2,
	-seed_length         => 28,
	-max_qual_mismatch   => 70,
	-sam_format          => 0
	);

my $align = $bowtiefac->run($rdq,$refseq); # this runs fine

my $bowtiemap = Bio::Tools::Run::Bowtie->new(
	-command             => 'map'
	);

my $map = $bowtiemap->run($align); # throws Illegal seek

print "$map\n";

open (IN,$map);
	my $lines =(my @lines)= <IN>;
	print @lines;
	print "\n\n$lines\n";
close IN;


From maj at fortinbras.us  Tue Dec 22 05:19:35 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 22 Dec 2009 00:19:35 -0500
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <F7513FBADF944B51823A5F22FFA85911@NewLife>

Hey Dan, 
It looks like if the outfile isn't specified on the commandline for
maptool, then the align is written to stdout. So, you could 
try this workaround in in Bowtie/Config.pm:

our %command_files = (
    'single'     => [qw( ind seq #out )],
    'paired'     => [qw( ind seq seq2 #out )],
    'crossbow'   => [qw( ind seq #out )],
    'build'      => [qw( ref out )],
    'inspect'    => [qw( ind >#out )],
    'convert'    => [qw( bwt out bfa )],
-    'map'        => [qw( bwt #out )]
+    'map'        => [qw( bwt >#out )]
    );

which should be transparent to the user. If this works, then
there is probably something funky going on with IPC::Run
+ maptool; if it doesn't, then the funkiness is prob. in my code.

I notice, however, that both bowtie-maptool and bowtie-maqconvert
have been removed from the 0.12.0-beta release 
(http://bowtie-bio.sourceforge.net/index.shtml)...

cheers MAJ

----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, December 21, 2009 10:24 PM
Subject: [Bioperl-l] call for help and comments on module


> Hi,
> 
> I've been working on a Bio::Tools::Run module to handle the bowtie rapid
> alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in
> bioperl-run tree).
> 
> I have 90% of what I want included in the module and would like some
> advice from more experienced bioperlers. Feedback on approach is also
> welcomed (this is my first significant wrapper, and after a long gap
> from writing module, so I am rusty). The module has ended up being
> significantly more complicated than I had hoped.
> 
> There are a few issues I'm having, so I apologise for the list:
> 
>     1. Informal tests run correctly (outside the t/ tree and Test
>        harness), but formal Test harness tests fail for reasons I
>        cannot understand. (The module is still lacking a lot of tests,
>        but since things were failing in the harness I have placed them
>        as a lower priority and have been working to my micro-script
>        tests - yes, bad form.
>     2. I am having a big problem with IPC::Run for one of the
>        executables (the module can call 5 different excutables for 7
>        commands), bowtie-maptool (module command 'map'). All the other
>        commands tested (this excludes bowtie-maqconvert [convert
>        command]) work fine, but maptool fails with an illegal seek -
>        presumably due to the redirection handling? I have no idea how
>        to resolve this, so help would be greatly appreciated (a small
>        script that demonstrates the use that results in the failure is
>        below).
> 
> There will be provision for returning a Bio::Assembly::IO object through
> samtools in the finished module, but currently the
> Bio::Assembly::IO::sam builder doesn't like what bowtie can provide.
> 
> Thanks for any help.
> Dan
> 
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> use Bio::Tools::Run::Bowtie;
> 
> # These files are in the bioperl-run t/data/ tree
> my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq';
> my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli';
> 
> my $bowtiefac = Bio::Tools::Run::Bowtie->new(
> -command             => 'single',
> -max_seed_mismatches => 2,
> -seed_length         => 28,
> -max_qual_mismatch   => 70,
> -sam_format          => 0
> );
> 
> my $align = $bowtiefac->run($rdq,$refseq); # this runs fine
> 
> my $bowtiemap = Bio::Tools::Run::Bowtie->new(
> -command             => 'map'
> );
> 
> my $map = $bowtiemap->run($align); # throws Illegal seek
> 
> print "$map\n";
> 
> open (IN,$map);
> my $lines =(my @lines)= <IN>;
> print @lines;
> print "\n\n$lines\n";
> close IN;
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From dan.kortschak at adelaide.edu.au  Tue Dec 22 05:51:30 2009
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 22 Dec 2009 16:21:30 +1030
Subject: [Bioperl-l] call for help and comments on module
In-Reply-To: <F7513FBADF944B51823A5F22FFA85911@NewLife>
References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au>
	<F7513FBADF944B51823A5F22FFA85911@NewLife>
Message-ID: <1261461090.4411.13.camel@epistle>

Hi Mark,

maptool either outputs to stdout or a specified file - I chose to use a
specified file and run it that way, but I've tried the redirect a you
suggest, with the same failure result. I think it's a strangeness of
maptool (which may well be a reason for it being dropped - also note the
maptool output doesn't seem reasonable for the test data provided even
when run from the command line).

It's probably a result of difficult interaction between IPC::Run and
maptool. Any funkiness in your code is not likely to be a cause as I've
deeply analysed what is being passed to IPC::Run, and I've quite
extensively modified the IPC run handling method from your code to take
into account the differences between a single executable with many
commands as the base code managed from a cluster of executables each
taking a small subset of different filespecs as bowtie needs. My
funkiness will undoubtedly swamp yours.

Resolution: Will drop bowtie-maptool from module.

(Should test maqconvert - if it fails, this will be dropped also unless
someone asks otherwise).

When the module copes with 0.11.* properly I'll start thinking about
0.12.* which has colourspace handling to deal with.

cheers
Dan

On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote:
> Hey Dan, 
> It looks like if the outfile isn't specified on the commandline for
> maptool, then the align is written to stdout. So, you could 
> try this workaround in in Bowtie/Config.pm:
> 
> our %command_files = (
>     'single'     => [qw( ind seq #out )],
>     'paired'     => [qw( ind seq seq2 #out )],
>     'crossbow'   => [qw( ind seq #out )],
>     'build'      => [qw( ref out )],
>     'inspect'    => [qw( ind >#out )],
>     'convert'    => [qw( bwt out bfa )],
> -    'map'        => [qw( bwt #out )]
> +    'map'        => [qw( bwt >#out )]
>     );
> 
> which should be transparent to the user. If this works, then
> there is probably something funky going on with IPC::Run
> + maptool; if it doesn't, then the funkiness is prob. in my code.
> 
> I notice, however, that both bowtie-maptool and bowtie-maqconvert
> have been removed from the 0.12.0-beta release 
> (http://bowtie-bio.sourceforge.net/index.shtml)...
> 
> cheers MAJ


From lovebaby39 at gmail.com  Wed Dec 23 10:48:55 2009
From: lovebaby39 at gmail.com (Hsueh)
Date: Wed, 23 Dec 2009 18:48:55 +0800
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
References: <5F281DC3E4514B3AAA8881169B240227@SHAPC>
	<107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se>
	<9DEC7152C11A4F00B2F919B653E6D572@SHAPC>
	<15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se>
Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>

Dear all

I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how 
to get "P.pastoris DNA for pPIC9K expression vector".

    while (my $result_u =  $blast_report_u-> next_result ) {
        while (my $hit_u = $result_u->next_hit()){
            while (my $hsp_u = $hit_u->next_hsp()){
                    $hit_u->name;
                    $hsp_u->evalue;
                    $hsp_u->score;
            }
        }
    }

I will appreciate if you could tell me how to do it.

P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download 
link?)


The flow is BLAST result:
-------------------------------------------------------------------------------------------------------------------------------------
BLASTN 2.2.16 [Mar-25-2007]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
Query=
         (458 letters)

Database: UniVec (build 4.0)
           2416 sequences; 597,480 total letters
Searching..................................................done
                                                                             
                                        Score    E
Sequences producing significant alignments: 
(bits)     Value

gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 
26   3.1
gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 
26   3.1
gnl|uv|U13843.1:1887-9923 pBPV cloning vector 
26   3.1

>gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
          Length = 2781

 Score = 26.3 bits (13), Expect = 3.1
 Identities = 13/13 (100%)
 Strand = Plus / Plus

Query: 352  tactaccgccatt 364
            |||||||||||||
Sbjct: 2209 tactaccgccatt 2221
-------------------------------------------------------------------------------------------------------------------------------------

Reginald Hsueh 


From hrh at fmi.ch  Wed Dec 23 15:14:06 2009
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Wed, 23 Dec 2009 16:14:06 +0100
Subject: [Bioperl-l] About bioperl issue: get string
In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC>
Message-ID: <C757F24E.5FE2%hrh@fmi.ch>

Hi

Assuming you are using "SearchIO", try:

$hit_u->description

for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO


Regards, Hans


On 12/23/09 11:48 AM, "Hsueh" <lovebaby39 at gmail.com> wrote:

> Dear all
> 
> I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how
> to get "P.pastoris DNA for pPIC9K expression vector".
> 
>     while (my $result_u =  $blast_report_u-> next_result ) {
>         while (my $hit_u = $result_u->next_hit()){
>             while (my $hsp_u = $hit_u->next_hsp()){
>                     $hit_u->name;
>                     $hsp_u->evalue;
>                     $hsp_u->score;
>             }
>         }
>     }
> 
> I will appreciate if you could tell me how to do it.
> 
> P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download
> link?)
> 
> 
> 
> The flow is BLAST result:
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> BLASTN 2.2.16 [Mar-25-2007]
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> Query=
>          (458 letters)
> 
> Database: UniVec (build 4.0)
>            2416 sequences; 597,480 total letters
> Searching..................................................done
>                  
>                                         Score    E
> Sequences producing significant alignments:
> (bits)     Value
> 
> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve...
> 26   3.1
> gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo
> 26   3.1
> gnl|uv|U13843.1:1887-9923 pBPV cloning vector
> 26   3.1
> 
>> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector
>           Length = 2781
> 
>  Score = 26.3 bits (13), Expect = 3.1
>  Identities = 13/13 (100%)
>  Strand = Plus / Plus
> 
> Query: 352  tactaccgccatt 364
>             |||||||||||||
> Sbjct: 2209 tactaccgccatt 2221
> ------------------------------------------------------------------------------
> -------------------------------------------------------
> 
> Reginald Hsueh 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 18:36:49 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 12:36:49 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
Message-ID: <200912231236490784820@gmail.com>

Hi Everyone,

I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 

I attached my CODEML outputs here to see whether you guys have some idea. 

Many thanks ahead!
 				
Best regards,
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.1
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0016.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.1
Type: application/octet-stream
Size: 11635 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0017.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc4.3b
Type: application/octet-stream
Size: 11330 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0018.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rst4.3b
Type: application/octet-stream
Size: 60616 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20091223/6e91b939/attachment-0019.obj>

From cjfields at illinois.edu  Wed Dec 23 21:19:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 23 Dec 2009 15:19:48 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231236490784820@gmail.com>
References: <200912231236490784820@gmail.com>
Message-ID: <B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>

Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.

Can you file a bioperl bug report for this?  It's the best place to keep track.

http://bugzilla.open-bio.org/

chris

On Dec 23, 2009, at 12:36 PM, pkuonline wrote:

> Hi Everyone,
> 
> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
> 
> I attached my CODEML outputs here to see whether you guys have some idea. 
> 
> Many thanks ahead!
> 				
> Best regards,
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pkuonline at gmail.com  Wed Dec 23 22:45:54 2009
From: pkuonline at gmail.com (pkuonline)
Date: Wed, 23 Dec 2009 16:45:54 -0600
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
Message-ID: <200912231645536094087@gmail.com>

Hi Chris,

Thanks for your reply and I just submitted this bug to bugzilla. 

Have a nice holiday!
-------------------------------------------------------------
Yong Zhang
Ph.D, Research Scholar
Manyuan Long's Lab
University of Chicago

>-------------------------------------------------------------
>From: Chris Fields
>Time: 2009-12-23 15:19:50
>To: pkuonline  bioperl-l
>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1

>Well, not completely unexpected, but very frustrating nonetheless.  Changes to PAML output have broken in just about every PAML parser revision.  Not sure when this will be addressed unfortunately, my hope is sooner than later.
>
>Can you file a bioperl bug report for this?  It's the best place to keep track.
>
>http://bugzilla.open-bio.org/
>
>chris
>
>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>
>> Hi Everyone,
>> 
>> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. 
>> 
>> I attached my CODEML outputs here to see whether you guys have some idea. 
>> 
>> Many thanks ahead!
>> 				
>> Best regards,
>> -------------------------------------------------------------
>> Yong Zhang
>> Ph.D, Research Scholar
>> Manyuan Long's Lab
>> University of Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From David.Messina at sbc.su.se  Wed Dec 23 23:23:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 24 Dec 2009 00:23:44 +0100
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se>

Hi Yong,

Could you attach your codeml output to the bug report, too?

I'll take a look at this as soon as I can.


Dave


From maj at fortinbras.us  Thu Dec 24 05:47:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 24 Dec 2009 00:47:10 -0500
Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
In-Reply-To: <200912231645536094087@gmail.com>
References: <200912231236490784820@gmail.com>,
	<B2C762F5-63F5-4A2D-8207-7A1EFF698B63@illinois.edu>
	<200912231645536094087@gmail.com>
Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife>

Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ
----- Original Message ----- 
From: "pkuonline" <pkuonline at gmail.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "bioperl-l" <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 23, 2009 5:45 PM
Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1


> Hi Chris,
>
> Thanks for your reply and I just submitted this bug to bugzilla.
>
> Have a nice holiday!
> -------------------------------------------------------------
> Yong Zhang
> Ph.D, Research Scholar
> Manyuan Long's Lab
> University of Chicago
>
>>-------------------------------------------------------------
>>From: Chris Fields
>>Time: 2009-12-23 15:19:50
>>To: pkuonline  bioperl-l
>>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1
>
>>Well, not completely unexpected, but very frustrating nonetheless.  Changes to 
>>PAML output have broken in just about every PAML parser revision.  Not sure 
>>when this will be addressed unfortunately, my hope is sooner than later.
>>
>>Can you file a bioperl bug report for this?  It's the best place to keep 
>>track.
>>
>>http://bugzilla.open-bio.org/
>>
>>chris
>>
>>On Dec 23, 2009, at 12:36 PM, pkuonline wrote:
>>
>>> Hi Everyone,
>>>
>>> I used the latest Bioperl build, 
>>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to 
>>> parse CODEML result. I searched the mail list and found current PAML parser 
>>> is compatible with PAML 4.3a, 
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. 
>>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser 
>>> does not work. More strangely, I tested it on the old PAML 4.1 result and 
>>> also failed.
>>>
>>> I attached my CODEML outputs here to see whether you guys have some idea.
>>>
>>> Many thanks ahead!
>>>
>>> Best regards,
>>> -------------------------------------------------------------
>>> Yong Zhang
>>> Ph.D, Research Scholar
>>> Manyuan Long's Lab
>>> University of 
>>> Chicago<rst4.1><mlc4.1><mlc4.3b><rst4.3b>_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 


From bhakti.dwivedi at gmail.com  Sat Dec 26 02:46:51 2009
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Fri, 25 Dec 2009 21:46:51 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
Message-ID: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>

Hi,

Does anyone know how to retrieve the "Source" or the "Species name" given
the accession number using Bioperl.   I have these 30,000 accession numbers
for which I need to get the source organisms.  Any kind of help will be
appreciated.

Thanks

BD


From maj at fortinbras.us  Sat Dec 26 03:52:10 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 25 Dec 2009 22:52:10 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>

Bhakti,
The following example (using EUtilities) may serve your purpose:

use Bio::DB::EUtilities;

my (%taxa, @taxa);
my (%names, %idmap);

# these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
# (probably)

my @ids = qw(1621261 89318838 68536103 20807972 730439);

my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
                                       -db => 'taxonomy',
                                       -dbfrom => 'protein',
                                       -correspondence => 1,
                                       -id => \@ids);

# iterate through the LinkSet objects
while (my $ds = $factory->next_LinkSet) {
    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
}

@taxa = @taxa{@ids};

$factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
        -db    => 'taxonomy',
        -id    => \@taxa );

while (local $_ = $factory->next_DocSum) {
    $names{($_->get_contents_by_name('TaxId'))[0]} = 
($_->get_contents_by_name('ScientificName'))[0];
}

foreach (@ids) {
    $idmap{$_} = $names{$taxa{$_}};
}

# %idmap is
#    1621261 => 'Mycobacterium tuberculosis H37Rv'
#    20807972 => 'Thermoanaerobacter tengcongensis MB4'
#    68536103 => 'Corynebacterium jeikeium K411'
#    730439 => 'Bacillus caldolyticus'
#    89318838 => undef    (this record has been removed from the db)

1;

You probably will need to break up your 30000 into chunks
(say, 1000-3000 each), and do the above on each chunk with a

sleep 3;

or so separating the queries.
MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, December 25, 2009 9:46 PM
Subject: [Bioperl-l] how to retrieve organism name from accession number?


> Hi,
>
> Does anyone know how to retrieve the "Source" or the "Species name" given
> the accession number using Bioperl.   I have these 30,000 accession numbers
> for which I need to get the source organisms.  Any kind of help will be
> appreciated.
>
> Thanks
>
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Sat Dec 26 11:47:29 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 26 Dec 2009 05:47:29 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <AD7C8B9A-61D1-443C-952E-BC7C66E398B2@illinois.edu>


On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote:

> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> ...
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ

The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec.

chris


From arpm9 at charter.net  Sun Dec 27 21:42:09 2009
From: arpm9 at charter.net (arpm9)
Date: Sun, 27 Dec 2009 16:42:09 -0500
Subject: [Bioperl-l]  Should Bio::Tools::BPlite be deprecated?
In-Reply-To: 4533A8D3.90709@sendu.me.uk
Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9>

hi chris,
 I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm


From pengyu.ut at gmail.com  Tue Dec 29 16:08:09 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 10:08:09 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>

May I ask somebody who are versitile in both bioperl and biopython
comment on the pros and cons of bioperl and biopython? I'm sending
this email to both bioperl and biopython mailing lists. But I hope
that it will not result in any contention.

I assume that the functionality between bioperl or biopython is the
same, i.e., tasks can be done in bioperl can be done biopython and
vice versa, as both libraries have been out there over 10 years.
Please correct me if my understanding is not true.

Given that a task that can be done with either bioperl or biopython,
I, in particularly, want to know how long it will take to write the
code for the task in bioperl and biopython, with the same readability
requirement (see below) and the assumption that users have the same
fluency in perl and python.

python is claimed to be good for maintainability. But perl is
criticized for there-are-many-ways-for-a-given-task. Since there are
multiple ways in perl, let us assume that we always use perl in a
readable way.


From jason at bioperl.org  Tue Dec 29 16:49:20 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 08:49:20 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>

Are you asking for the purposes of choosing a toolkit for your work or  
just curious about the advantages/disadvantages of language choice?

-jason
On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:

> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
>
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
>
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From ak at ebi.ac.uk  Tue Dec 29 16:57:18 2009
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Tue, 29 Dec 2009 16:57:18 +0000
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk>

On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
> 
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.
> 
> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.
> 
> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

Assuming, as you do, that the functionality of BioPerl and BioPython is
the same:  Which of the two programming languages are you (or your team)
most proficient in?  Use that language.

Regards,
Andreas

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom


From sdavis2 at mail.nih.gov  Tue Dec 29 17:03:40 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 12:03:40 -0500
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> May I ask somebody who are versitile in both bioperl and biopython
> comment on the pros and cons of bioperl and biopython? I'm sending
> this email to both bioperl and biopython mailing lists. But I hope
> that it will not result in any contention.
>
> I assume that the functionality between bioperl or biopython is the
> same, i.e., tasks can be done in bioperl can be done biopython and
> vice versa, as both libraries have been out there over 10 years.
> Please correct me if my understanding is not true.

The two projects have similar goals, but saying that the functionality
is the same would be an extreme oversimplification.  You will need to
define what you want to do and then check to see what the two projects
have to offer.  This will, in general, require perusing the websites
for both projects as well as the relevant documentation.

> Given that a task that can be done with either bioperl or biopython,
> I, in particularly, want to know how long it will take to write the
> code for the task in bioperl and biopython, with the same readability
> requirement (see below) and the assumption that users have the same
> fluency in perl and python.

Again, you will want to define the task(s) to be accomplished and then
weigh the pros and cons of each project combined with local expertise.
 If you don't know what you want to do, then you can certainly read
some examples on the websites and see which project strikes you as a
"winner" for you.

> python is claimed to be good for maintainability. But perl is
> criticized for there-are-many-ways-for-a-given-task. Since there are
> multiple ways in perl, let us assume that we always use perl in a
> readable way.

These two statements are generalizations that provide little insight
into the strengths or weaknesses of the languages.  In other words,
one can write good or bad code in both languages.

Hope that helps.

Sean


From wenzhiwang1983 at yahoo.com.cn  Tue Dec 29 18:30:02 2009
From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi)
Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST)
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com>

Dear Jason,

Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO?

Thanks.

Wenzhi Wang
   State Key Laboratory of Genetic Resources and Evolution
   Kunming Institute of Zoology, Chinese Academy of Sciences
   Kunming, Yunnan 650223 P. R. China
   Tel:  86 871 5198 993
   Fax: 86 871 5195 430
   E-mail: wenzhiwang1983 at yahoo.com.cn


      ___________________________________________________________ 
  ????????????????? 
http://card.mail.cn.yahoo.com/


From pengyu.ut at gmail.com  Tue Dec 29 18:58:59 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 12:58:59 -0600
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org>
Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com>

To choose a toolkit for my work.

On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich <jason at bioperl.org> wrote:
> Are you asking for the purposes of choosing a toolkit for your work or just
> curious about the advantages/disadvantages of language choice?
>
> -jason
> On Dec 29, 2009, at 8:08 AM, Peng Yu wrote:
>
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>>
>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From pengyu.ut at gmail.com  Tue Dec 29 19:15:14 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 13:15:14 -0600
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> May I ask somebody who are versitile in both bioperl and biopython
>> comment on the pros and cons of bioperl and biopython? I'm sending
>> this email to both bioperl and biopython mailing lists. But I hope
>> that it will not result in any contention.
>>
>> I assume that the functionality between bioperl or biopython is the
>> same, i.e., tasks can be done in bioperl can be done biopython and
>> vice versa, as both libraries have been out there over 10 years.
>> Please correct me if my understanding is not true.
>
> The two projects have similar goals, but saying that the functionality
> is the same would be an extreme oversimplification. ?You will need to
> define what you want to do and then check to see what the two projects
> have to offer. ?This will, in general, require perusing the websites
> for both projects as well as the relevant documentation.

According to your experience, are there some tasks that are easier
with one than with another?

>> Given that a task that can be done with either bioperl or biopython,
>> I, in particularly, want to know how long it will take to write the
>> code for the task in bioperl and biopython, with the same readability
>> requirement (see below) and the assumption that users have the same
>> fluency in perl and python.
>
> Again, you will want to define the task(s) to be accomplished and then
> weigh the pros and cons of each project combined with local expertise.
> ?If you don't know what you want to do, then you can certainly read
> some examples on the websites and see which project strikes you as a
> "winner" for you.
>
>> python is claimed to be good for maintainability. But perl is
>> criticized for there-are-many-ways-for-a-given-task. Since there are
>> multiple ways in perl, let us assume that we always use perl in a
>> readable way.
>
> These two statements are generalizations that provide little insight
> into the strengths or weaknesses of the languages. ?In other words,
> one can write good or bad code in both languages.
>
> Hope that helps.
>
> Sean
>


From alperyilmaz at gmail.com  Tue Dec 29 19:36:03 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Tue, 29 Dec 2009 14:36:03 -0500
Subject: [Bioperl-l] Bio::TreeIO,
	Bio::Tree::Draw::Cladogram and phyloxml issues..
Message-ID: <dac81b0d0912291136x53edf2cjc6728e7062bd3bc1@mail.gmail.com>

Hello,

I have a tree in phyloxml format, and am trying to draw a subtree by
using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for
drawing and encountered some problems.

When I use whole tree and draw it, everything is fine; but, when I
pick a particular node and construct the subtree from that node's
ancestor by using "my $subtree = Bio::Tree::Tree->new(-root =>
$new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a
faulty EPS file, which contains extra lines added in the middle of the
file.
For instance:
.
.
.
72.0820393261372 126 moveto
(OsIBCD006509) show
30 81.25 moveto
 81.25 lineto
  lineto
48.5410196630686 120 moveto
30 120 lineto
.
.
.

Should read:

72.0820393261372 126 moveto
(OsIBCD006509) show
48.5410196630686 120 moveto
30 120 lineto


Also, I tried to write the subtree into a new phyloxml file first,
then draw it. The code is shown as follows:
my $savefile = "save.phyloxml";
my $treeout = Bio::TreeIO->new(-format =>'phyloxml',
                               -file => ">$savefile");
$treeout->write_tree($subtree);
my $tree2 = Bio::TreeIO->new(-format =>'phyloxml',
                                                 -file => "save.phyloxml");
my $t1 = $tree2->next_tree;
my $image_output = "test.eps";
my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree   => $t1,
                                                                  -top    => 10,
                                                                -bottom => 10,);
$obj1->print(-file => $image_output);

The generated phyloxml file, which is named save.phyloxml, has an
additional new line between "</phylogeny>" and "</phyloxml>" at the
end of the file. And this additional new line lead an error when doing
the parsing(open file and draw eps). I removed the new line, manually,
then Bio::Tree::Draw::Cladogram gave me the eps file successfully.

Anyone knows how to fix these problems:
1- faulty eps file generation
2- additional newline character in phyloxml output

Is it the problem about the way I create the subtree?

The phyloxml file I used can be downloaded from:
http://grassius.org/download/HSF.phyloxml

Run this code with the phyloxml file to see newline character problem:
http://pastebin.com/f87ee1ee

Run this code with the phyloxml file to see faulty eps file problem:
http://pastebin.com/fc4715a1

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954


From pengyu.ut at gmail.com  Tue Dec 29 21:32:17 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Tue, 29 Dec 2009 15:32:17 -0600
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>

http://bioperl.org/Core/Latest/modules.html

Many links if not all are broken on the above pages. Could somebody fix it?

For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
I see the following error.

There is currently no text in this page. You can search for this page
title in other pages, search the related logs, or edit this page.


From jason at bioperl.org  Tue Dec 29 21:49:00 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:49:00 -0800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>

That is an outdated URL I am not sure where you are linking it from.  
We can probably now disable all old '/Core' URLs.

All documentation links are in the /wiki/

The beginner's howto is here for example
  http://bioperl.org/wiki/HOWTO:Beginners

> http://www.bioperl.org/wiki/HOWTOs


On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:

> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody  
> fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 21:50:26 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:50:26 -0800
Subject: [Bioperl-l] Comparison between bioperl and biopython?
In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
References: <658770.25534.qm@web15204.mail.cnb.yahoo.com>
Message-ID: <AA645194-F78E-4484-8952-02C40C1270F4@bioperl.org>

yep - be great if someone were to write it.  This being a volunteer  
project we welcome your contribution.  No I don't specifically have  
plans to do it, but maybe you can give it a try or another population  
genetics interested bioperl user/developer?

-jason
On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote:

> Dear Jason,
>
> Plink is a very useful program in the population genetics,  
> especially in the Genome-Wide SNP scan era. Is there any plan to add  
> the Plink (ped or tped) format to Bio::PopGen::IO?
>
> Thanks.
>
> Wenzhi Wang
>   State Key Laboratory of Genetic Resources and Evolution
>   Kunming Institute of Zoology, Chinese Academy of Sciences
>   Kunming, Yunnan 650223 P. R. China
>   Tel:  86 871 5198 993
>   Fax: 86 871 5195 430
>   E-mail: wenzhiwang1983 at yahoo.com.cn
>
>
>      ___________________________________________________________
>  ?????????????????
> http://card.mail.cn.yahoo.com/

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jason at bioperl.org  Tue Dec 29 21:57:49 2009
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 29 Dec 2009 13:57:49 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org>


On Dec 29, 2009, at 11:15 AM, Peng Yu wrote:

> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>  
> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com>  
>> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the  
>> functionality
>> is the same would be an extreme oversimplification.  You will need to
>> define what you want to do and then check to see what the two  
>> projects
>> have to offer.  This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?

As you have still failed to give much insight into the 'tasks' it is  
hard to give you a better answer.

If there is a module or set of routines already written then yes one  
might be easier than the other. Otherwise it just depends on your  
strengths in the programming language.
We discussed the strengths of the different toolkits briefly on the  
podcast last month.  http://twit.tv/floss96

I echo Sean. Use whichever language you are a better programmer in.   
BioPerl is more mature in some facets than is BioPython, but BioPython  
has some components that are more heavily developed and supported than  
BioPerl (structures being one of those and interfacing that to pyMol  
would be a strength).   I personally think the Gbrowse, Bio-Graphics,  
and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence  
databases and Features is a critical aspect of mining  genomic data  
and features and use these heavily in my work, making BioPerl easy and  
powerful for my tasks. That and sequence and alignment parsing and  
reformatting.  But there are comparable tools written in python with  
and without BioPython that you can also use so mainly it is about  
building up an expertise in a toolkit and going forward.  The BioPerl  
faithful will probably say it is more useful toolkit to us, but we are  
of course a biased sample.

Both projects can benefit from more users and developers contributing  
code and documentation so I would just jump in and give it a try if  
you are unsure which will be easier for you.

>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same  
>>> readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and  
>> then
>> weigh the pros and cons of each project combined with local  
>> expertise.
>>  If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages.  In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From pengyu.ut at gmail.com  Tue Dec 29 22:01:05 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:01:05 +1800
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and CDS boundary for 	a RefSeq ID?
Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>

I see the following example. But it is not clear to me how to get the
exon sequences. I also want to get the exon boundaries and associated
CDS boundaries. Although, I can get the boundary information from ucsc
table browser, but it would be convenient if I can get it in bioperl
along with the sequence.

Could somebody let me know how do it?

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html


From sdavis2 at mail.nih.gov  Tue Dec 29 22:13:30 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 17:13:30 -0500
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com>

On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> http://bioperl.org/Core/Latest/modules.html
>
> Many links if not all are broken on the above pages. Could somebody fix it?
>
> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
> I see the following error.
>
> There is currently no text in this page. You can search for this page
> title in other pages, search the related logs, or edit this page.

It is unfortunate that the links are broken on that page.  However, I
believe that page is somewhat outdated, anyway.  Here are the HOWTO
pages:

http://www.bioperl.org/wiki/HOWTOs

Sean


From pengyu.ut at gmail.com  Tue Dec 29 22:21:16 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Wed, 30 Dec 2009 16:21:16 +1800
Subject: [Bioperl-l] Document missing on Core/Latest/modules.html
In-Reply-To: <A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com>
	<A867B98E-D3D4-4693-8655-F633086E925D@bioperl.org>
Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com>

On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich <jason at bioperl.org> wrote:
> That is an outdated URL I am not sure where you are linking it from. We can
> probably now disable all old '/Core' URLs.

I'm linked from here.

http://www.bioperl.org/wiki/BioPerl_Tutorial

Since those URLs are outdated. Could you please fix the links on the above link?

> All documentation links are in the /wiki/
>
> The beginner's howto is here for example
> ?http://bioperl.org/wiki/HOWTO:Beginners
>
>> http://www.bioperl.org/wiki/HOWTOs
>
>
> On Dec 29, 2009, at 1:32 PM, Peng Yu wrote:
>
>> http://bioperl.org/Core/Latest/modules.html
>>
>> Many links if not all are broken on the above pages. Could somebody fix
>> it?
>>
>> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt,
>> I see the following error.
>>
>> There is currently no text in this page. You can search for this page
>> title in other pages, search the related logs, or edit this page.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From sdavis2 at mail.nih.gov  Tue Dec 29 23:06:17 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 29 Dec 2009 18:06:17 -0500
Subject: [Bioperl-l] How to download the exon sequences,
	and the exon and 	CDS boundary for a RefSeq ID?
In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com>
Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com>

On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> I see the following example. But it is not clear to me how to get the
> exon sequences. I also want to get the exon boundaries and associated
> CDS boundaries. Although, I can get the boundary information from ucsc
> table browser, but it would be convenient if I can get it in bioperl
> along with the sequence.
>
> Could somebody let me know how do it?
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html

Hi, Peng.  There may be some confusion, as the UCSC database aligns
RefSeq sequence to a genome to generate exon start and end
coordinates.  However, the RefSeq records retrieved by Bio::DB::RefSeq
are not in genomic context and so do not have start and end locations
on the genome.  That is, if you want the starts and ends along the
genome, that information is not available from the RefSeq record
itself, I don't think.  If that is what you need (genomic
coordinates), you can download the information directly from UCSC,
download flat files from NCBI mapview, or even from ensembl (using
biomart, for instance).  If you are looking for a bioperl-compliant
way of doing this, look at the Ensembl Perl API.

Sean


From jkhilmer at gmail.com  Tue Dec 29 19:55:18 2009
From: jkhilmer at gmail.com (Jonathan Hilmer)
Date: Tue, 29 Dec 2009 12:55:18 -0700
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>

Personally, I think that the differences between Python and Perl
(although substantial) are not large enough to make the language
itself the deciding factor.

Instead, consider the larger community of software.  I haven't yet
found a situation in which Python cannot be applied: it can be used
with R (statistics); lower-level code C or fortran; visualization
software such as PyMol, Chimera, Blender, VTK; plotting with
matplotlib; and scipy/numpy or sage, which provide innumerable
benefits for computation, data-processing, etc.

Although I don't claim to have a great deal of experience with Perl, I
haven't seen the same integration with that language: I'm assuming it
can be used with R and VTK (not sure about C or fortran?).  For this
reason, unless your work is highly targeted and you have no use
programming language integration with other software, I would
recommend Python.

For perl experts, I would truly appreciate any corrections you could
offer to these observations of mine, since I wouldn't mind using perl
if it offers benefits either in general or for specific applications.


Jonathan

On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> May I ask somebody who are versitile in both bioperl and biopython
>>> comment on the pros and cons of bioperl and biopython? I'm sending
>>> this email to both bioperl and biopython mailing lists. But I hope
>>> that it will not result in any contention.
>>>
>>> I assume that the functionality between bioperl or biopython is the
>>> same, i.e., tasks can be done in bioperl can be done biopython and
>>> vice versa, as both libraries have been out there over 10 years.
>>> Please correct me if my understanding is not true.
>>
>> The two projects have similar goals, but saying that the functionality
>> is the same would be an extreme oversimplification. ?You will need to
>> define what you want to do and then check to see what the two projects
>> have to offer. ?This will, in general, require perusing the websites
>> for both projects as well as the relevant documentation.
>
> According to your experience, are there some tasks that are easier
> with one than with another?
>
>>> Given that a task that can be done with either bioperl or biopython,
>>> I, in particularly, want to know how long it will take to write the
>>> code for the task in bioperl and biopython, with the same readability
>>> requirement (see below) and the assumption that users have the same
>>> fluency in perl and python.
>>
>> Again, you will want to define the task(s) to be accomplished and then
>> weigh the pros and cons of each project combined with local expertise.
>> ?If you don't know what you want to do, then you can certainly read
>> some examples on the websites and see which project strikes you as a
>> "winner" for you.
>>
>>> python is claimed to be good for maintainability. But perl is
>>> criticized for there-are-many-ways-for-a-given-task. Since there are
>>> multiple ways in perl, let us assume that we always use perl in a
>>> readable way.
>>
>> These two statements are generalizations that provide little insight
>> into the strengths or weaknesses of the languages. ?In other words,
>> one can write good or bad code in both languages.
>>
>> Hope that helps.
>>
>> Sean
>>
>
> _______________________________________________
> Biopython mailing list ?- ?Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From wgheath at gmail.com  Tue Dec 29 20:16:39 2009
From: wgheath at gmail.com (William Heath)
Date: Tue, 29 Dec 2009 12:16:39 -0800
Subject: [Bioperl-l] [Biopython] Comparison between bioperl and
	biopython?
In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com>
	<264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com>
	<366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com>
	<81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com>
Message-ID: <f08ddf990912291216h32988b8cv20830c1b6701caf6@mail.gmail.com>

The biggest reason to go with python is the ease of use.  Biologists are not
programmers and the learning curve for python is much smaller than that of
perl.  I like perl but choose python because of this issue.  Perl 6 does
address some of these issues however but this has not been fully implemented
as of yet.

-Tim

P.S.

I love, love, love cpan though which is only for perl right now :(

On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer <jkhilmer at gmail.com>wrote:

> Personally, I think that the differences between Python and Perl
> (although substantial) are not large enough to make the language
> itself the deciding factor.
>
> Instead, consider the larger community of software.  I haven't yet
> found a situation in which Python cannot be applied: it can be used
> with R (statistics); lower-level code C or fortran; visualization
> software such as PyMol, Chimera, Blender, VTK; plotting with
> matplotlib; and scipy/numpy or sage, which provide innumerable
> benefits for computation, data-processing, etc.
>
> Although I don't claim to have a great deal of experience with Perl, I
> haven't seen the same integration with that language: I'm assuming it
> can be used with R and VTK (not sure about C or fortran?).  For this
> reason, unless your work is highly targeted and you have no use
> programming language integration with other software, I would
> recommend Python.
>
> For perl experts, I would truly appreciate any corrections you could
> offer to these observations of mine, since I wouldn't mind using perl
> if it offers benefits either in general or for specific applications.
>
>
> Jonathan
>
> On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> >>> May I ask somebody who are versitile in both bioperl and biopython
> >>> comment on the pros and cons of bioperl and biopython? I'm sending
> >>> this email to both bioperl and biopython mailing lists. But I hope
> >>> that it will not result in any contention.
> >>>
> >>> I assume that the functionality between bioperl or biopython is the
> >>> same, i.e., tasks can be done in bioperl can be done biopython and
> >>> vice versa, as both libraries have been out there over 10 years.
> >>> Please correct me if my understanding is not true.
> >>
> >> The two projects have similar goals, but saying that the functionality
> >> is the same would be an extreme oversimplification.  You will need to
> >> define what you want to do and then check to see what the two projects
> >> have to offer.  This will, in general, require perusing the websites
> >> for both projects as well as the relevant documentation.
> >
> > According to your experience, are there some tasks that are easier
> > with one than with another?
> >
> >>> Given that a task that can be done with either bioperl or biopython,
> >>> I, in particularly, want to know how long it will take to write the
> >>> code for the task in bioperl and biopython, with the same readability
> >>> requirement (see below) and the assumption that users have the same
> >>> fluency in perl and python.
> >>
> >> Again, you will want to define the task(s) to be accomplished and then
> >> weigh the pros and cons of each project combined with local expertise.
> >>  If you don't know what you want to do, then you can certainly read
> >> some examples on the websites and see which project strikes you as a
> >> "winner" for you.
> >>
> >>> python is claimed to be good for maintainability. But perl is
> >>> criticized for there-are-many-ways-for-a-given-task. Since there are
> >>> multiple ways in perl, let us assume that we always use perl in a
> >>> readable way.
> >>
> >> These two statements are generalizations that provide little insight
> >> into the strengths or weaknesses of the languages.  In other words,
> >> one can write good or bad code in both languages.
> >>
> >> Hope that helps.
> >>
> >> Sean
> >>
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From pengyu.ut at gmail.com  Wed Dec 30 17:26:45 2009
From: pengyu.ut at gmail.com (Peng Yu)
Date: Thu, 31 Dec 2009 11:26:45 +1800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>

With Bio::SeqIO, I can only read in the records in a fasta file one by
one. This is preferable if there are many records in a file.

But I also want to read all the records in. I could use a while loop
to read all records in. But could somebody let me know if there is a
function in bioperl that can read in all the record at once and return
me an object?

http://www.bioperl.org/wiki/HOWTO:SeqIO


From sdavis2 at mail.nih.gov  Wed Dec 30 18:04:53 2009
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 30 Dec 2009 13:04:53 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>

On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
>
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?

In perl, you can use an array to store the records.  You could also
use a hash if you have reasonable keys for the entries.

Sean


> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Wed Dec 30 19:58:54 2009
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 30 Dec 2009 11:58:54 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com>
Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org>

or use a database object so you can retrieve sequences that have a  
particular id. See Bio::DB::Fasta
On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:

> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>> by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and  
>> return
>> me an object?
>
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
>
> Sean
>
>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Wed Dec 30 21:20:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 30 Dec 2009 16:20:31 -0500
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com>
Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife>

I think you might want Bio::AlignIO:

$alnio = Bio::AlignIO->new(-file=> 'my.fas' );
$aln = $alnio->next_aln;
@seqs = $aln->each_seqs;

MAJ
----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, December 30, 2009 12:26 PM
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?


> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From David.Messina at sbc.su.se  Thu Dec 31 10:55:32 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 31 Dec 2009 11:55:32 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
Message-ID: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave