[Bioperl-l] Re (3): Status of assembly modules

Florent Angly florent.angly at gmail.com
Mon Dec 20 04:08:05 UTC 2010


Hi Lee,

I was able to fix the bug you reported regarding the contig IDs in the 
developement verison of Bioperl.

For the other bug, please file a bug report at 
http://bugzilla.open-bio.org/ and give me the URL. Provide the file that 
you used so that we can reproduce the bug. Also, tell me how much memory 
you have on your machine, as I suspect that you may be running out of 
memory because of the size of your dataset and the way the Bioperl 
assembly modules deal with contigs and scaffolds.

Thank you,

Florent


On 16/12/10 10:15, Chris Fields wrote:
> Lee,
>
> You are more than welcome to look at the code to optimize it; might be worth looking athe they way scaffolds, contigs, etc are defined within one aonther.  I believe Florent Angly has been actively working on these modules; Florent may have thoughts on this.
>
> chris
>
> On Dec 10, 2010, at 5:53 PM, Lee Katz wrote:
>
>> I am wondering if there is a way to optimize the BioPerl code for Assembly
>> IO.  Specifically, when I convert a 2.2 MB genome (~200 contigs) from a 454
>> ace file to a regular ace file, it takes a few hours to get through 30
>> contigs using the code below (I estimate more than a day to get through all
>> of it).
>>
>> Is there a way to optimize it?  To convert a sequence file to another format
>> at most would take a minute and therefore converting an ace on the magnitude
>> of hours or days is too much.  I wish I understood bioperl better but I
>> think the best I can do is issue a challenge or a feature request:  who can
>> speed up Assembly::IO::ace?
>>
>> # convert a Newbler ace to a standard ace
>> sub _newblerAceToAce($args){
>>   my($self,$args)=@_;
>>   my
>> $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454');
>>   my $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace");
>> #output ace
>>   my $numContigs=`grep -c ^CO $$args{ace454Path}`+0;
>>   logmsg "Converting $$args{ace454Path} (454-ace) to $$args{acePath} (ace).
>> $numContigs contigs total.";
>>   while(my $contig=$ace454->next_contig){
>>     logmsg "Finished with ".$contig->id ." out of $numContigs";
>>     $ace->write_contig($contig);
>>   }
>>   return $$args{acePath};
>> }
>>
>>
>> Message: 3
>>
>> Date: Mon, 22 Nov 2010 15:18:10 -0500
>>
>> From: Lee Katz<lskatz at gatech.edu>
>>
>> Subject: [Bioperl-l] Re(2): Status of assembly modules
>>
>> To: bioperl-l at lists.open-bio.org
>>
>> Message-ID:
>>
>>        <AANLkTi=JShCLsHDxHK4eeWD3Da=vWmRkGN2rkuLwCjxn at mail.gmail.com>
>>
>> Content-Type: text/plain; charset=UTF-8
>>
>>
>> I figured it out (I haven't tested much though).
>>
>>
>> To whoever works on Assembly::IO::ace.pm:
>>
>> I changed a regular expression on line 231 because the contig object was not
>>
>> initializing properly.  For some reason the 454 ace file had adopted the
>>
>> reference assembly's ID and therefore there was a GI number followed by a
>>
>> pipe.  The pipe was not captured with \w+.  I think that the regex will be
>>
>> safe with \s(\S+)\s.
>>
>>
>> if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts!
>>
>> #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts!
>>
>>
>> On Thu, Nov 18, 2010 at 12:04 PM,<bioperl-l-request at lists.open-bio.org
>>> wrote:
>>
>>> Message: 3
>>> Date: Wed, 17 Nov 2010 22:20:03 -0500
>>> From: Lee Katz<lskatz at gatech.edu>
>>> Subject: Re: [Bioperl-l] Status of assembly modules
>>> To: bioperl-l at lists.open-bio.org
>>> Message-ID:
>>>        <AANLkTi=aFAnYrEXj3D4joZeYwxRT971M_ZYR0uFJOrxc at mail.gmail.com>
>>> Content-Type: text/plain; charset=UTF-8
>>> I have read on the BioPerl site that a 454 ace is not standardized due to
>>> its coordinate system.  How can I convert it to the standard ace file?
>>> When I run this code either by using contig or assembly objects, I get an
>>> error.
>>> Can't call method "get_consensus_sequence" on an undefined value at
>>> Bio/Assembly/IO/ace.pm line 280,<GEN0>  line 93349.
>>>    sub _newblerAceToAce($args){
>>>      my($self,$args)=@_;
>>>      my
>>> $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace
>> ",-variant=>'454');
>>
>>>      my
>>> $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace");
>>>      #while(my $contig=$ace454->next_contig){
>>>      while(my $scaffold=$ace454->next_assembly){
>>>        print Dumper $scaffold;
>>>      }
>>>      return $args{acePath};
>>>    }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list