[Bioperl-l] How to read in the whole fasta file in the memory?

Jean-Marc Frigerio INRA Jean-Marc.Frigerio at pierroton.inra.fr
Mon Jan 4 14:12:18 UTC 2010


> Message: 1
> Date: Thu, 31 Dec 2009 11:26:45 +1800
> From: Peng Yu <pengyu.ut at gmail.com>
> Subject: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: bioperl-l at lists.open-bio.org
> Message-ID:
> 	<366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 30 Dec 2009 13:04:53 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: "bioperl-l at lists.open-bio.org" <bioperl-l at lists.open-bio.org>
> Message-ID:
> 	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
> 
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
> 
> Sean
> 
> 
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 30 Dec 2009 11:58:54 -0800
> From: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: BioPerl List <bioperl-l at lists.open-bio.org>
> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> or use a database object so you can retrieve sequences that have a  
> particular id. See Bio::DB::Fasta
> On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:
> 
>> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>>> by
>>> one. This is preferable if there are many records in a file.
>>>
>>> But I also want to read all the records in. I could use a while loop
>>> to read all records in. But could somebody let me know if there is a
>>> function in bioperl that can read in all the record at once and  
>>> return
>>> me an object?
>> In perl, you can use an array to store the records.  You could also
>> use a hash if you have reasonable keys for the entries.
>>
>> Sean
>>
>>
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 30 Dec 2009 16:20:31 -0500
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: "Peng Yu" <pengyu.ut at gmail.com>, <bioperl-l at lists.open-bio.org>
> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
> 
> I think you might want Bio::AlignIO:
> 
> $alnio = Bio::AlignIO->new(-file=> 'my.fas' );
> $aln = $alnio->next_aln;
> @seqs = $aln->each_seqs;
> 
> MAJ
> ----- Original Message ----- 
> From: "Peng Yu" <pengyu.ut at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 30, 2009 12:26 PM
> Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
> 
> 
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
>>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Hi,

I wrote and currently use a module I named Bio::SeqIO::multifasta, which 
is basically a copy of Bio::SeqIO::fasta plus a few methods:
get_by_id(), get_by_order(), first_seq() and previous_seq()

It would need review, validation etc. Do I submit it to Bugzilla ?

	-- jmf



More information about the Bioperl-l mailing list