[Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl

Abhishek Pratap abhishek.vit at gmail.com
Wed Jan 7 21:50:51 EST 2009


Thanks Joshua.
I will use it and get back to you if we have any questions here.

Best,
-Abhi

On Wed, Jan 7, 2009 at 12:57 AM, Joshua Udall <jaudall at gmail.com> wrote:

> Done.  Let me know if you have any questions.  Here's the comments I
> included with the submission (plus a few additions):
>
> Attached is code to facilitate ace file IO - particularly of large ace
> files.  The code will read ace contig entries one-at-a-time, instead of all
> at once in the following manner:
> $contig = stream->next_contig
>
> It will write ace files to a text file using:
> $stream->write_contig($contig)
>
> General Usage:
> my $contig_io =
> Bio::Assembly::ContigIO->new(-file=>$ace_filename,-format=>'ace');
> while (defined (my $contig = $contig_io->next_contig() ) )
> {
> # do something here.
> }
>
> The general usage above should be familiar to those using bioperl.  It is
> obviously different than the AssemblyIO which also uses a '->next' stream
> and an ace.pm file (in the IO dir).  I found that very confusing because I
> haven't often had multiple assemblies that I need to parse and it seems like
> overkill.
>
> The main files are ContigIO.pm and the ace.pm in the ContigIO dir.  I've
> attached other files that are in the bundle too.  We did this some time ago
> and though the files have the same author info at the top, we've made a few
> changes to them.
>
> A several months ago, I found that the recently discussed LocatableSeq bug
> was causing problems for me with this code.  Not imagining that I could have
> actually found a bioperl bug myself, I made my own simple workaround by
> adjusting the 'end' value.  If the LocatableSeq bug has been fixed, this
> module should work fine.  I'm simply commenting that it is untested with
> 1.6.
>
> I've also attached the files submitted to bugzilla to this message as per
> Abhichek's request.  Good luck.
>
> Josh
>
>
>
>
> On Tue, Jan 6, 2009 at 3:22 PM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> Could you archive the files and attach them to a bug report (you can mark
>> it as an enhancement request).  We can take a look.
>>
>> http://bugzilla.open-bio.org/
>>
>> chris
>>
>>
>> On Jan 6, 2009, at 5:13 PM, Joshua Udall wrote:
>>
>>  Chris et al. -
>>>
>>> A student and I have written code to do this - write ace files as well as
>>> parse them one entry at a time.  In trying to use the Assembly::IO as it
>>> was
>>> in 1.5, we ran into problems with large ace files containing many entries
>>> because of file handle limit issues with the inherited implementation
>>> DB_File.  Our implementation simply reads one contig at a time instead of
>>> first trying to slurp the whole ace into memory.  I'm happy to add it to
>>> Bioperl, but I am not sure how to do it.  If I sent *.pm files to
>>> someone,
>>> could they help me get it into bioperl?  It may not be perfect either,
>>> but
>>> it should be a good start.
>>>
>>> Josh
>>>
>>> On Tue, Jan 6, 2009 at 1:52 PM, Chris Fields <cjfields at illinois.edu>
>>> wrote:
>>>
>>>  Not at this time (write_assembly is not implemented).  If you come up
>>>> with
>>>> code to do so let us know (patches are always welcome).
>>>>
>>>> chris
>>>>
>>>>
>>>> On Jan 6, 2009, at 2:43 PM, Abhishek Pratap wrote:
>>>>
>>>> Thanks that helped.
>>>>
>>>>>
>>>>> Any method to write Ace files ?
>>>>>
>>>>> Thanks,
>>>>> -Abhi
>>>>>
>>>>> On Tue, Jan 6, 2009 at 3:36 PM, Smithies, Russell <
>>>>> Russell.Smithies at agresearch.co.nz> wrote:
>>>>>
>>>>> Here's how I've been doing it:
>>>>>
>>>>>>
>>>>>>
>>>>>> my $infile = "454Contigs.ace";
>>>>>> my $parser = new Bio::Assembly::IO(-file   => $infile ,-format =>
>>>>>> "ace")
>>>>>> or
>>>>>> die $!;
>>>>>> my $assembly = $parser->next_assembly;
>>>>>>
>>>>>> # to work with a named contig
>>>>>> my @wanted_id = ("Contig100");
>>>>>> my ($contig) = $assembly->select_contigs(@wanted_id) or die $!;
>>>>>>
>>>>>> #get the consensus
>>>>>> my $consensus = $contig->get_consensus_sequence();
>>>>>>
>>>>>> #get the consensus qualities
>>>>>> my @quality_values  = @{$contig->get_consensus_quality()->qual()};
>>>>>>
>>>>>> hope this helps,
>>>>>>
>>>>>> Russell
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>>
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>>>>>>> Sent: Tuesday, 6 January 2009 6:43 p.m.
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: [Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl
>>>>>>>
>>>>>>> Hi All
>>>>>>>
>>>>>>> I am looking for some code to parse the ACE file format. I have big
>>>>>>> ACE
>>>>>>> files which I would like to trim based on the user defined Contig
>>>>>>> name
>>>>>>> and
>>>>>>> specific region and write out the output to another fresh ACE file.
>>>>>>>
>>>>>>> For now I am trying to tweak Bio::Assembly::IO; but it is kind of
>>>>>>> slow.
>>>>>>> Any
>>>>>>> other alternative or suggestions.
>>>>>>>
>>>>>>> Thanks All,
>>>>>>> -Abhi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -----------------------------
>>>>>>> Abhishek Pratap
>>>>>>> Bioinformatics Software Engineer
>>>>>>> Institute for Genome Sciences
>>>>>>> School of Medicine, Univ of Maryland
>>>>>>> 801, W. Baltimore Street, Baltimore, MD 21209
>>>>>>> Ph: (+1)-410-706-2296
>>>>>>> www.igs.umaryland.edu/
>>>>>>>
>>>>>>> Chair
>>>>>>> RSG-Worldwide
>>>>>>> ISCB-Student Council
>>>>>>> http://iscbsc.org/rsg
>>>>>>>
>>>>>>> www.bioinfosolutions.com
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>> =======================================================================
>>>>>> Attention: The information contained in this message and/or
>>>>>> attachments
>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>> to which it is addressed and may contain confidential and/or
>>>>>> privileged
>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>>>> or
>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>> entities other than the intended recipients is prohibited by
>>>>>> AgResearch
>>>>>> Limited. If you have received this message in error, please notify the
>>>>>> sender immediately.
>>>>>>
>>>>>> =======================================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> -----------------------------
>>>>> Abhishek Pratap
>>>>> Bioinformatics Software Engineer
>>>>> Institute for Genome Sciences
>>>>> School of Medicine, Univ of Maryland
>>>>> 801, W. Baltimore Street, Baltimore, MD 21209
>>>>> Ph: (+1)-410-706-2296
>>>>> www.igs.umaryland.edu/
>>>>>
>>>>> Chair
>>>>> RSG-Worldwide
>>>>> ISCB-Student Council
>>>>> http://iscbsc.org/rsg
>>>>>
>>>>> www.bioinfosolutions.com
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>>
>>> --
>>> Joshua Udall
>>> Assistant Professor
>>> 295 WIDB
>>> Plant and Wildlife Science Dept.
>>> Brigham Young University
>>> Provo, UT 84602
>>> 801-422-9307
>>> Fax: 801-422-0008
>>> USA
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
>
> --
> Joshua Udall
> Assistant Professor
> 295 WIDB
> Plant and Wildlife Science Dept.
> Brigham Young University
> Provo, UT 84602
> 801-422-9307
> Fax: 801-422-0008
> USA
>



-- 
-----------------------------
Abhishek Pratap
Bioinformatics Software Engineer
Institute for Genome Sciences
School of Medicine, Univ of Maryland
801, W. Baltimore Street, Baltimore, MD 21209
Ph: (+1)-410-706-2296
www.igs.umaryland.edu/

Chair
RSG-Worldwide
ISCB-Student Council
http://iscbsc.org/rsg

www.bioinfosolutions.com


More information about the Bioperl-l mailing list