[Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl

Mark A. Jensen maj at fortinbras.us
Thu Jan 8 03:11:13 UTC 2009


Abhi/Josh- Please give me a shout if LocatableSeq gives you any throws or if you 
get strange coordinates. We're hoping it's well-fixed, but...
thanks-Mark
----- Original Message ----- 
From: "Abhishek Pratap" <abhishek.vit at gmail.com>
To: "Joshua Udall" <jaudall at gmail.com>
Cc: "Chris Fields" <cjfields at illinois.edu>; <bioperl-l at lists.open-bio.org>; 
"Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
Sent: Wednesday, January 07, 2009 9:50 PM
Subject: Re: [Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl


> Thanks Joshua.
> I will use it and get back to you if we have any questions here.
>
> Best,
> -Abhi
>
> On Wed, Jan 7, 2009 at 12:57 AM, Joshua Udall <jaudall at gmail.com> wrote:
>
>> Done.  Let me know if you have any questions.  Here's the comments I
>> included with the submission (plus a few additions):
>>
>> Attached is code to facilitate ace file IO - particularly of large ace
>> files.  The code will read ace contig entries one-at-a-time, instead of all
>> at once in the following manner:
>> $contig = stream->next_contig
>>
>> It will write ace files to a text file using:
>> $stream->write_contig($contig)
>>
>> General Usage:
>> my $contig_io =
>> Bio::Assembly::ContigIO->new(-file=>$ace_filename,-format=>'ace');
>> while (defined (my $contig = $contig_io->next_contig() ) )
>> {
>> # do something here.
>> }
>>
>> The general usage above should be familiar to those using bioperl.  It is
>> obviously different than the AssemblyIO which also uses a '->next' stream
>> and an ace.pm file (in the IO dir).  I found that very confusing because I
>> haven't often had multiple assemblies that I need to parse and it seems like
>> overkill.
>>
>> The main files are ContigIO.pm and the ace.pm in the ContigIO dir.  I've
>> attached other files that are in the bundle too.  We did this some time ago
>> and though the files have the same author info at the top, we've made a few
>> changes to them.
>>
>> A several months ago, I found that the recently discussed LocatableSeq bug
>> was causing problems for me with this code.  Not imagining that I could have
>> actually found a bioperl bug myself, I made my own simple workaround by
>> adjusting the 'end' value.  If the LocatableSeq bug has been fixed, this
>> module should work fine.  I'm simply commenting that it is untested with
>> 1.6.
>>
>> I've also attached the files submitted to bugzilla to this message as per
>> Abhichek's request.  Good luck.
>>
>> Josh
>>
>>
>>
>>
>> On Tue, Jan 6, 2009 at 3:22 PM, Chris Fields <cjfields at illinois.edu>wrote:
>>
>>> Could you archive the files and attach them to a bug report (you can mark
>>> it as an enhancement request).  We can take a look.
>>>
>>> http://bugzilla.open-bio.org/
>>>
>>> chris
>>>
>>>
>>> On Jan 6, 2009, at 5:13 PM, Joshua Udall wrote:
>>>
>>>  Chris et al. -
>>>>
>>>> A student and I have written code to do this - write ace files as well as
>>>> parse them one entry at a time.  In trying to use the Assembly::IO as it
>>>> was
>>>> in 1.5, we ran into problems with large ace files containing many entries
>>>> because of file handle limit issues with the inherited implementation
>>>> DB_File.  Our implementation simply reads one contig at a time instead of
>>>> first trying to slurp the whole ace into memory.  I'm happy to add it to
>>>> Bioperl, but I am not sure how to do it.  If I sent *.pm files to
>>>> someone,
>>>> could they help me get it into bioperl?  It may not be perfect either,
>>>> but
>>>> it should be a good start.
>>>>
>>>> Josh
>>>>
>>>> On Tue, Jan 6, 2009 at 1:52 PM, Chris Fields <cjfields at illinois.edu>
>>>> wrote:
>>>>
>>>>  Not at this time (write_assembly is not implemented).  If you come up
>>>>> with
>>>>> code to do so let us know (patches are always welcome).
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>> On Jan 6, 2009, at 2:43 PM, Abhishek Pratap wrote:
>>>>>
>>>>> Thanks that helped.
>>>>>
>>>>>>
>>>>>> Any method to write Ace files ?
>>>>>>
>>>>>> Thanks,
>>>>>> -Abhi
>>>>>>
>>>>>> On Tue, Jan 6, 2009 at 3:36 PM, Smithies, Russell <
>>>>>> Russell.Smithies at agresearch.co.nz> wrote:
>>>>>>
>>>>>> Here's how I've been doing it:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> my $infile = "454Contigs.ace";
>>>>>>> my $parser = new Bio::Assembly::IO(-file   => $infile ,-format =>
>>>>>>> "ace")
>>>>>>> or
>>>>>>> die $!;
>>>>>>> my $assembly = $parser->next_assembly;
>>>>>>>
>>>>>>> # to work with a named contig
>>>>>>> my @wanted_id = ("Contig100");
>>>>>>> my ($contig) = $assembly->select_contigs(@wanted_id) or die $!;
>>>>>>>
>>>>>>> #get the consensus
>>>>>>> my $consensus = $contig->get_consensus_sequence();
>>>>>>>
>>>>>>> #get the consensus qualities
>>>>>>> my @quality_values  = @{$contig->get_consensus_quality()->qual()};
>>>>>>>
>>>>>>> hope this helps,
>>>>>>>
>>>>>>> Russell
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>>
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>>>>>>>> Sent: Tuesday, 6 January 2009 6:43 p.m.
>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>> Subject: [Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl
>>>>>>>>
>>>>>>>> Hi All
>>>>>>>>
>>>>>>>> I am looking for some code to parse the ACE file format. I have big
>>>>>>>> ACE
>>>>>>>> files which I would like to trim based on the user defined Contig
>>>>>>>> name
>>>>>>>> and
>>>>>>>> specific region and write out the output to another fresh ACE file.
>>>>>>>>
>>>>>>>> For now I am trying to tweak Bio::Assembly::IO; but it is kind of
>>>>>>>> slow.
>>>>>>>> Any
>>>>>>>> other alternative or suggestions.
>>>>>>>>
>>>>>>>> Thanks All,
>>>>>>>> -Abhi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------------------------
>>>>>>>> Abhishek Pratap
>>>>>>>> Bioinformatics Software Engineer
>>>>>>>> Institute for Genome Sciences
>>>>>>>> School of Medicine, Univ of Maryland
>>>>>>>> 801, W. Baltimore Street, Baltimore, MD 21209
>>>>>>>> Ph: (+1)-410-706-2296
>>>>>>>> www.igs.umaryland.edu/
>>>>>>>>
>>>>>>>> Chair
>>>>>>>> RSG-Worldwide
>>>>>>>> ISCB-Student Council
>>>>>>>> http://iscbsc.org/rsg
>>>>>>>>
>>>>>>>> www.bioinfosolutions.com
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>> =======================================================================
>>>>>>> Attention: The information contained in this message and/or
>>>>>>> attachments
>>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>>> to which it is addressed and may contain confidential and/or
>>>>>>> privileged
>>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>>>>> or
>>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>>> entities other than the intended recipients is prohibited by
>>>>>>> AgResearch
>>>>>>> Limited. If you have received this message in error, please notify the
>>>>>>> sender immediately.
>>>>>>>
>>>>>>> =======================================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> -----------------------------
>>>>>> Abhishek Pratap
>>>>>> Bioinformatics Software Engineer
>>>>>> Institute for Genome Sciences
>>>>>> School of Medicine, Univ of Maryland
>>>>>> 801, W. Baltimore Street, Baltimore, MD 21209
>>>>>> Ph: (+1)-410-706-2296
>>>>>> www.igs.umaryland.edu/
>>>>>>
>>>>>> Chair
>>>>>> RSG-Worldwide
>>>>>> ISCB-Student Council
>>>>>> http://iscbsc.org/rsg
>>>>>>
>>>>>> www.bioinfosolutions.com
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Joshua Udall
>>>> Assistant Professor
>>>> 295 WIDB
>>>> Plant and Wildlife Science Dept.
>>>> Brigham Young University
>>>> Provo, UT 84602
>>>> 801-422-9307
>>>> Fax: 801-422-0008
>>>> USA
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>>
>> --
>> Joshua Udall
>> Assistant Professor
>> 295 WIDB
>> Plant and Wildlife Science Dept.
>> Brigham Young University
>> Provo, UT 84602
>> 801-422-9307
>> Fax: 801-422-0008
>> USA
>>
>
>
>
> -- 
> -----------------------------
> Abhishek Pratap
> Bioinformatics Software Engineer
> Institute for Genome Sciences
> School of Medicine, Univ of Maryland
> 801, W. Baltimore Street, Baltimore, MD 21209
> Ph: (+1)-410-706-2296
> www.igs.umaryland.edu/
>
> Chair
> RSG-Worldwide
> ISCB-Student Council
> http://iscbsc.org/rsg
>
> www.bioinfosolutions.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 




More information about the Bioperl-l mailing list