[Biopython-dev] New: sequtils.py

Tue Jul 24 09:09:04 EDT 2001

Hej All,

After the Biopython BoF meeting at ISMB01 in Copenhagen we decided to
temporarily collect seqeuence utilities/functions in Bio/sequtils.py
Cessie (our new biopython member) and I started by collecting some functions 
(some of them are just aliases to existing - but deeply hidden functions).

Currently included:
	  ProteinX, makeTableX for error free translation of ambiguous DNA
	  complement, reverse, antiparallel and translate
	  nice six_frame_translations ala DNA Strider/XBBtools
	  GC, GC123, GC_skew, Accumulated_GC_skew
	  fasta_uniqids for getting unique identifiers in the FASTA file (useful) for using clustalw
	  quick_FASTA_reader for reading huge FASTA files (e.g. genomes)
	  apply_on_multi_fasta: use any function (e.g. GC) and apply it on all entries in a multiple FASTA file

Questions: 
1) should we move Proteinx and maketablex somewhere else ?
2) we included a quick_fasta_reader hack, the FASTA parser is cool and nice
   but because of all checkings it takes ages for e.g. a complete genome
   Should we create a faster alternative ? (compatible with the normal one)
3) some functions exists in utils.py. Could we move sequence based functions 
   to sequtils.py and use utils.py for other non-seqeunce based functions ?
   (e.g. I'd like to put my hyper-geometric distribution code there for expression data)
4) anyone got a hangover from yesterdays banquette ?

cheers
-thomas

Sicheritz-Ponten Thomas, Ph.D  CBS, Department of Biotechnology
thomas at biopython.org           The Technical University of Denmark
CBS:  +45 45 252489            Building 208, DK-2800 Lyngby
Fax   +45 45 931585            http://www.cbs.dtu.dk/thomas

	De Chelonian Mobile ... The Turtle Moves ...