[BioPython] what to use for working with fasta sequences and alignments?

Peter (BioPython List) biopython at maubp.freeserve.co.uk
Wed Jan 10 15:58:28 UTC 2007


Jan Kosinski wrote:
> Hi,
> 
> I am quite new in BioPython and I am a little bit confused when trying 
> to use BioPython for working with fasta sequences and alignments.
> 
> For instance, I can read and parse fasta files with Bio.Fasta, return 
> records (as Fasta.record class), iterate and so on. But then I am going 
> to Bio.Fasta.FastaAlign module which offers FastaAlignment (subclass of 
> Alignment class) class. However, this class has very limited methods and 
> get_all_seqs and get_seq_by_num return SeqRecord object instead of 
> Fasta.record (why??) what makes it hard to use Bio.Fasta.FastaAlign 
> (with SeqRecord) for alignments with Bio.Fasta (with Fasta.record) for 
> sequences. Maybe I am wrong but Biopython seems to be full of 
> incompatibilities. Or one should know which modules and classes should 
> not be used?
>
> Could you recommend me what should I use for my work with fasta 
> sequences and alignments? Which BioPython modules and classes?

You can use Bio.Fasta to read in files either as Fasta.Record objects, 
or as SeqRecord objects.  I would use SeqRecord objects - they are more 
general should you ever want to use a different input file format - plus 
as you have noticed, the alignment object also uses SeqRecord objects to 
hold each (gapped) sequence.

There are other options if you search the code - but Bio.Fasta is the 
best documented and most used.

If you are brave, then you might have a look at the new code in 
Bio.SeqIO which you can get from CVS.  This is still in a state of flux 
however... but the Fasta parsing is much faster.  See this page and the 
mailing list archives for more:

http://www.biopython.org/wiki/SeqIO

 > Or should I use other packages like CoreBio?

You could do - it has the advantage of having started recently from a 
clean slate, and having much less "old code".

> Thank you in advance for any guidelines,
> Janek Kosinski

Peter



More information about the Biopython mailing list