[BioPython] Re: question regarding writing SeqRecord objects in Fasta format

Mon Jul 18 20:21:04 EDT 2005

Ann Loraine wrote:

> Hello,
>
> To answer your question - I read in the fasta records like so:
>
> from Bio import Fasta
> fh = gzip.Gzipfile('seqs.fa.gz').open()
> parser = Fasta.RecordParser()
> iterator = Fasta.Iterator(fh,parser)
> curr_record = iterator.next()
>
> I was following the example in this tutorial Web page:
>
> http://www.biopython.org/docs/tutorial/Tutorial003.html#toc7
>
>
> "Let's make all of this talk more concrete by using the Iterator and 
> Record interfaces to do what we did before -- extract a unique list of 
> all species in our FASTA file. First we need to set up our parser and 
> iterator:
> >>> from Bio import Fasta
> >>> parser = Fasta.RecordParser()
> >>> file = open("ls_orchid.fasta")
> >>> iterator = Fasta.Iterator(file, parser)"
>
> Should I be using the SeqIO method instead to read fasta records if I 
> want to write some of them out to a fasta format file?
>
> -Ann
>
>

Yes, if you want to use SeqIO for output, use it for input as well. When 
reading using Bio.Fasta.Iterator, you are creating Bio.Fasta.Record 
instances, which do not have the 'id' attribute. When reading using 
Bio.SeqIO.FastaReader, you are creating a Bio.SeqRecord instance, which 
is a different representation of a sequence. But Bio.SeqIO.FASTA does 
have a writing method, so you may want to use that.

The reason that Biopython has two ways of representing sequences are 
basically historical: both methods were CVS deposited, approved, and 
code grew around both. Not exactly optimal I know.

HTH<

./I

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://ffas.ljcrf.edu/~iddo