[Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue Jan 6 22:49:42 UTC 2009


Sounds like a good plan but I wouldn't know where to start. 
That level of Perlyness is a bit beyond me  :-(

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 7 January 2009 11:13 a.m.
> To: Smithies, Russell
> Cc: 'Abhishek Pratap'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] Parser: Ace file (Sequence Assembly) in
> Bioperl
> 
> How about re-implementing Bio::Assembly classes so they simply map to
> Bio::DB::SeqFeature::Store (or similar) methods?  Scaffold could just
> be a wrapper around a Bio::DB::SeqFeature::Store (which can be BDB/
> mysql/postgresql/memory) and return Contigs.
> 
> Similarly, the IO classes could probably act as specialized
> Bio::DB::SeqFeature::Store::Loade classes for the database and just
> return the Scaffold instance.
> 
> chris
> 
> On Jan 6, 2009, at 3:31 PM, Smithies, Russell wrote:
> 
> > I agree with the need for a faster parser.
> > Although the current version does a great job, it is slow and memory
> > intensive as it loads everything into Bio::Assembly::Scaffold
> > objects composed of Bio::Assembly::Contig objects.
> > I'm not sure exactly what the best solution would be, perhaps a new
> > constructor with a named contig would simplify things?
> >
> >    $io = new Bio::Assembly::IO(-file=>"454_assy.ace",-format=>"ace");
> >
> >    $contig = $io->next_assembly_with_contig(-contig=>"Contig000100")-
> > >select_contig;
> >
> > Or do we even need a next_assembly method?
> > Can there be more than one assembly in an .ace file?
> >
> > --Russell
> >
> >
> >
> > From: Abhishek Pratap [mailto:abhishek.vit at gmail.com]
> > Sent: Wednesday, 7 January 2009 10:07 a.m.
> > To: Chris Fields
> > Cc: Smithies, Russell; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Parser: Ace file (Sequence Assembly) in
> > Bioperl
> >
> > Ok .. Sure in case we do write something which eventually I will
> > have to :)  I will fwd it.
> >
> > @Russel:
> >
> > I feel to get info for specific the current method is very slow as
> > it tries to store the info for all contigs into memory. Such info
> > could be memory intensive specially with the next gen data coming
> > from 454 sequencers. I think we should grep to the contig/s of
> > itnerest and then create a record for it. Please correct me if I am
> > wrong.
> >
> > Thanks,
> > -Abhi
> > On Tue, Jan 6, 2009 at 3:52 PM, Chris Fields
> <cjfields at illinois.edu<mailto:cjfields at illinois.edu
> > >> wrote:
> > Not at this time (write_assembly is not implemented).  If you come
> > up with code to do so let us know (patches are always welcome).
> >
> > chris
> >
> >
> > On Jan 6, 2009, at 2:43 PM, Abhishek Pratap wrote:
> > Thanks that helped.
> >
> > Any method to write Ace files ?
> >
> > Thanks,
> > -Abhi
> >
> > On Tue, Jan 6, 2009 at 3:36 PM, Smithies, Russell <
> >
> Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co
> .nz
> > >> wrote:
> > Here's how I've been doing it:
> >
> >
> > my $infile = "454Contigs.ace";
> > my $parser = new Bio::Assembly::IO(-file   => $infile ,-format =>
> > "ace") or
> > die $!;
> > my $assembly = $parser->next_assembly;
> >
> > # to work with a named contig
> > my @wanted_id = ("Contig100");
> > my ($contig) = $assembly->select_contigs(@wanted_id) or die $!;
> >
> > #get the consensus
> > my $consensus = $contig->get_consensus_sequence();
> >
> > #get the consensus qualities
> > my @quality_values  = @{$contig->get_consensus_quality()->qual()};
> >
> > hope this helps,
> >
> > Russell
> >
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-
> bounces at lists.open-bio.org
> > > [mailto:bioperl-l-<mailto:bioperl-l->
> > bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On
> > Behalf Of Abhishek Pratap
> > Sent: Tuesday, 6 January 2009 6:43 p.m.
> > To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> > Subject: [Bioperl-l] Parser: Ace file (Sequence Assembly) in Bioperl
> >
> > Hi All
> >
> > I am looking for some code to parse the ACE file format. I have big
> > ACE
> > files which I would like to trim based on the user defined Contig
> name
> > and
> > specific region and write out the output to another fresh ACE file.
> >
> > For now I am trying to tweak Bio::Assembly::IO; but it is kind of
> > slow.
> > Any
> > other alternative or suggestions.
> >
> > Thanks All,
> > -Abhi
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > -----------------------------
> > Abhishek Pratap
> > Bioinformatics Software Engineer
> > Institute for Genome Sciences
> > School of Medicine, Univ of Maryland
> > 801, W. Baltimore Street, Baltimore, MD 21209
> > Ph: (+1)-410-706-2296
> > www.igs.umaryland.edu/<http://www.igs.umaryland.edu/>
> >
> > Chair
> > RSG-Worldwide
> > ISCB-Student Council
> > http://iscbsc.org/rsg
> >
> > www.bioinfosolutions.com<http://www.bioinfosolutions.com>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > =
> >
> ======================================================================
> > Attention: The information contained in this message and/or
> > attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> > privileged
> > material. Any review, retransmission, dissemination or other use of,
> > or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by
> > AgResearch
> > Limited. If you have received this message in error, please notify
> the
> > sender immediately.
> > =
> >
> ======================================================================
> >
> >
> >
> > --
> > -----------------------------
> > Abhishek Pratap
> > Bioinformatics Software Engineer
> > Institute for Genome Sciences
> > School of Medicine, Univ of Maryland
> > 801, W. Baltimore Street, Baltimore, MD 21209
> > Ph: (+1)-410-706-2296
> > www.igs.umaryland.edu/<http://www.igs.umaryland.edu/>
> >
> > Chair
> > RSG-Worldwide
> > ISCB-Student Council
> > http://iscbsc.org/rsg
> >
> > www.bioinfosolutions.com<http://www.bioinfosolutions.com>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
> > --
> > -----------------------------
> > Abhishek Pratap
> > Bioinformatics Software Engineer
> > Institute for Genome Sciences
> > School of Medicine, Univ of Maryland
> > 801, W. Baltimore Street, Baltimore, MD 21209
> > Ph: (+1)-410-706-2296
> > www.igs.umaryland.edu/<http://www.igs.umaryland.edu/>
> >
> > Chair
> > RSG-Worldwide
> > ISCB-Student Council
> > http://iscbsc.org/rsg
> >
> > www.bioinfosolutions.com<http://www.bioinfosolutions.com>
> >
> > =
> >
> ======================================================================
> > Attention: The information contained in this message and/or
> > attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> > privileged
> > material. Any review, retransmission, dissemination or other use of,
> > or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by
> > AgResearch
> > Limited. If you have received this message in error, please notify
> the
> > sender immediately.
> > =
> >
> ======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list