[Biopython-dev] Updates to the tutorial for parsing GenBank files

Wed Dec 14 13:33:15 EST 2005

Marc Colosimo wrote:
> The patch looks go to me , but i could have missed something there. I  
> forgot about the Discussion List. I really should join that list.

Motion seconded - any developer want to accept this?

> Also, I probably will be filling a bug on Bio.Fasta documentation.  
> There are two basic doc changes that should be made:
> 
> Under the doc for Fasta:
> RecordParser  Parses FASTA sequence data into a Record object <-  change 
> to a Fasta.Record object which is not the same as a Seq.Record

Sounds sensible

> Cookbooks:
> 
> Then maybe in the Cookbook, give an example on using  
> Fasta.SequenceParser with title2ids. With out title2ids, you don't  get 
> name or id. You only get description which is the title.  Fasta.Record 
> only has title, which maybe should be renamed   (depreciated to) 
> description to make it the same default behavior as  SequenceParser.

I don't usually bother with the title2ids function either.

I agree that the fact that its .title and .description depending on the 
parser used (Fasta.RecordParser or Fasta.SequenceParser) is odd.

> It seems odd that the Fasta stuff is buried within Chapter 2 (2.4.3  
> Making it easier - plus it is missing "import string").

Yes, but I think it would be better to avoid using the string module 
completely, and use the split method of the string object instead:

from Bio import Fasta

def parseTitle2Ids(title):
      return title.split("|")[:3]

parser = Fasta.SequenceParser(title2ids = parseTitle2Ids)
file = open("ls_orchid.fasta")
iterator = Fasta.Iterator(file, parser)
...

Peter