[BioPython] reading and cleaning fasta sequences

fab babasa fababasa@yahoo.fr
Tue, 3 Dec 2002 15:21:36 +0100 (CET)


Hello,
I'm starting with python and so I'm not yet very
efficient!
For a set of sequences (hundred sequences)in fasta
format, I would like to :
1/have the lenght of the sequence,
2/have the polyT (or polyA) size if it is present,
3/see if there are some masked regions indicated by X
and calculate sequence size beetween them... i.e. for
exemple: ...actggccttXXXXXXXX<...size I want to
know....>XXXXXXtttccggattgg... if not indicate there
not X regions in the sequence and 
4/calculate sequence size beetween X masked region and
without polyT (or polyA)
5/produce an output text file with :
-number of sequences present in the input file
-name of the sequence
-lenght of total sequence
-lenght of polyT if present
-lenght of sequence beetween X masked regions
-lenght of sequence beetween X masked regions and
without polyT

and like that for all sequences within the input file
with set of sequence (hundred of sequences...)

It's very complex for me and I hope someone will help
me!

Thanks a lot 



___________________________________________________________
Soyez solidaire soutenez l’action du Téléthon avec Yahoo! France.
http://www1.telethon.fr/030-Espace-Relais-Dons/webtirelire1.asp?hebergeur_id=1309