[Biopython-dev] [BioPython] what to use for working with fasta sequences and alignments?

Thu Jan 11 12:11:11 UTC 2007

Are you going to fix this in the new SeqIO?:

When using Bio.SeqIO.FASTA.FastaReader the names of the sequences are 
stripped away after the first "space".

Janek

Peter (BioPython List) wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> I am quite new in BioPython and I am a little bit confused when 
>> trying to use BioPython for working with fasta sequences and alignments.
>>
>> For instance, I can read and parse fasta files with Bio.Fasta, return 
>> records (as Fasta.record class), iterate and so on. But then I am 
>> going to Bio.Fasta.FastaAlign module which offers FastaAlignment 
>> (subclass of Alignment class) class. However, this class has very 
>> limited methods and get_all_seqs and get_seq_by_num return SeqRecord 
>> object instead of Fasta.record (why??) what makes it hard to use 
>> Bio.Fasta.FastaAlign (with SeqRecord) for alignments with Bio.Fasta 
>> (with Fasta.record) for sequences. Maybe I am wrong but Biopython 
>> seems to be full of incompatibilities. Or one should know which 
>> modules and classes should not be used?
>>
>> Could you recommend me what should I use for my work with fasta 
>> sequences and alignments? Which BioPython modules and classes?
>
> You can use Bio.Fasta to read in files either as Fasta.Record objects, 
> or as SeqRecord objects.  I would use SeqRecord objects - they are 
> more general should you ever want to use a different input file format 
> - plus as you have noticed, the alignment object also uses SeqRecord 
> objects to hold each (gapped) sequence.
>
> There are other options if you search the code - but Bio.Fasta is the 
> best documented and most used.
>
> If you are brave, then you might have a look at the new code in 
> Bio.SeqIO which you can get from CVS.  This is still in a state of 
> flux however... but the Fasta parsing is much faster.  See this page 
> and the mailing list archives for more:
>
> http://www.biopython.org/wiki/SeqIO
>
> > Or should I use other packages like CoreBio?
>
> You could do - it has the advantage of having started recently from a 
> clean slate, and having much less "old code".
>
>> Thank you in advance for any guidelines,
>> Janek Kosinski
>
> Peter