[Biopython-dev] Cleaning up Bio.SeqUtils

Thu Sep 25 22:47:58 UTC 2008

Peter, can you check in the corrected version of quick_FASTA_reader for 
me? I added the changes which were suggested in earlier posts (changes 
not affecting speed and simplicity)

def quick_FASTA_reader(file):
     "simple and quick FASTA reader to be used on large FASTA files"
     from os import linesep
     txt = open(file).read()
     entries = []
     splitter = "%s>" % linesep
     for entry in txt.split(splitter):
         name,seq= entry.split(linesep,1)
         if name[0]=='>': name = name[1:]
         seq = seq.replace('\n','').replace(' ','').upper()
         entries.append((name, seq))

     return entries

Concerning the seq3 function, I am not sure where it came from, I don't 
think I have added it.

cheers
-thomas

Peter wrote:
> On Thu, Sep 25, 2008 at 7:57 PM, Thomas Sicheritz-Ponten
> <thomas at cbs.dtu.dk> wrote:
>> Hej all,
>>
>> as I am guilty for most of the functions in SeqUtils/__init__.py, I might as
>> well join the cleaning team ...
> 
> Excellent :)
> 
>> apply_on_multi_fasta and quicker_apply_on_multi_fasta were only functions to
>> the turn the original SeqUtils.py into a possible standalone program, but I
>> guess not many actually used it.
> 
> That would explain some of that module's style.  We could deprecate
> the standalone bit too when we deprecate these functions.
> 
>> On the other hand quick_FASTA_reader was and still is used by a lot of
>> people, despite the irritating splitting bug which occurs if an entry name
>> happens to contain '>' ...
> 
> We should probably fix that if you think it can be done without
> loosing the current simplicity and speed (see below).
> 
>> Also, the translate and complement functions are from the time were these
>> functions were not easily accessed (we are talking about 2001-2002)
> 
> That does make sense - its a shame with hindsight that Biopython ended
> up with several ways to do this.
> 
>> In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and the
>> redundant translation machinery could and should get removed.
> 
> OK.  We should probably ask on the main list as a courtesy, and then
> deprecate them for the next release.
> 
>> Also if one can change the split function in quick_FASTA_reader? (I don't
>> have had checkin access since a long time)
> 
> If this is just an expired account / lost password you could try
> emailing the OBF support guys directly.  If they need someone to vouch
> for you drop me or Michiel an email off list.  In the short term I'm
> happy to check in a patch on your behalf (by email or via a bug
> report).
> 

>> Are there any other dubios functions we should discuss?
> 
> I'm sure there are more - but that should keep us busy for now :)
> 
> Are you happy with my recent tweak to the seq3 function (CVS revision
> 1.15)?  I wasn't 100% sure why it had used "Xer"
> 
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py.diff?r1=1.14&r2=1.15&cvsroot=biopython
> 
> Thanks,
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

-- 
Sicheritz-Ponten Thomas, Associate Professor, Ph.D       (
Head of Metagenomics, Technical University of Denmark     \
Center for Biological Sequence Analysis, BioCentrum        )
CBS: +45 45 252422      Building 208, DK-2800 Lyngby  ##----->
Fax: +45 45 931585      http://www.cbs.dtu.dk/~thomas      )
                                                           /
      ... damn arrow eating trees ...                     (