[Biopython] parsing a fasta with multiple entries

Peter biopython at maubp.freeserve.co.uk
Mon Apr 26 22:04:15 UTC 2010


On Mon, Apr 26, 2010 at 8:05 PM, Nick Leake wrote:
> Thanks Peter,
>
> All of the information is very helpful.  I apologize for sending second
> email.  I was thinking that the first email was going to be discarded for
> having the attachment - which in hindsight is an obvious fact.  At that
> time, I had only seen the initial email for rejecting the first.

I managed to reply before sending the original email (without
attachment) to the list - so partly my fault.

>>> I want to be able to access the DNA sequences for manipulation and
>>> later removal from a chromosomal region.  I originally thought that I
>>> could follow the same fasta format example shown in the biopython
>>> tutorial.  However, that failed to work.  I think it might be because
>>> there are multiple entries.
>>
>> The Bio.SeqIO.read() function is for when there is a single record. The
>> Bio.SeqIO.parse() function is for when you have multiple records. Could
>> you clarify which bit of the tutorial was confusing? We'd like to make it
>> better.
>
> The tutorial I used was from
> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html

OK, good - that is the current version.

> I will admit I didn't really know the difference from the Bio.SeqIO.read()
> verse the Bio.SeqIO.parse() functions even though they should be
> intuitive.  Still, the mentioned tutorial doen't seem to have a multiple
> entry parsed example. This is where my naiveté and confusion on
> the matter probably started.

It does (the file ls_orchid.fasta used in several examples has 94
entries), but I guess there is a lot of information in there and it can be
overwhelming.

Your problems with the funny EMBL file probably didn't help :(

Peter




More information about the Biopython mailing list