[Biopython-dev] Cleaning up Bio.SeqUtils
Thomas Sicheritz-Ponten
thomas at cbs.dtu.dk
Thu Sep 25 22:47:58 UTC 2008
Peter, can you check in the corrected version of quick_FASTA_reader for
me? I added the changes which were suggested in earlier posts (changes
not affecting speed and simplicity)
def quick_FASTA_reader(file):
"simple and quick FASTA reader to be used on large FASTA files"
from os import linesep
txt = open(file).read()
entries = []
splitter = "%s>" % linesep
for entry in txt.split(splitter):
name,seq= entry.split(linesep,1)
if name[0]=='>': name = name[1:]
seq = seq.replace('\n','').replace(' ','').upper()
entries.append((name, seq))
return entries
Concerning the seq3 function, I am not sure where it came from, I don't
think I have added it.
cheers
-thomas
Peter wrote:
> On Thu, Sep 25, 2008 at 7:57 PM, Thomas Sicheritz-Ponten
> <thomas at cbs.dtu.dk> wrote:
>> Hej all,
>>
>> as I am guilty for most of the functions in SeqUtils/__init__.py, I might as
>> well join the cleaning team ...
>
> Excellent :)
>
>> apply_on_multi_fasta and quicker_apply_on_multi_fasta were only functions to
>> the turn the original SeqUtils.py into a possible standalone program, but I
>> guess not many actually used it.
>
> That would explain some of that module's style. We could deprecate
> the standalone bit too when we deprecate these functions.
>
>> On the other hand quick_FASTA_reader was and still is used by a lot of
>> people, despite the irritating splitting bug which occurs if an entry name
>> happens to contain '>' ...
>
> We should probably fix that if you think it can be done without
> loosing the current simplicity and speed (see below).
>
>> Also, the translate and complement functions are from the time were these
>> functions were not easily accessed (we are talking about 2001-2002)
>
> That does make sense - its a shame with hindsight that Biopython ended
> up with several ways to do this.
>
>> In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and the
>> redundant translation machinery could and should get removed.
>
> OK. We should probably ask on the main list as a courtesy, and then
> deprecate them for the next release.
>
>> Also if one can change the split function in quick_FASTA_reader? (I don't
>> have had checkin access since a long time)
>
> If this is just an expired account / lost password you could try
> emailing the OBF support guys directly. If they need someone to vouch
> for you drop me or Michiel an email off list. In the short term I'm
> happy to check in a patch on your behalf (by email or via a bug
> report).
>
>> Are there any other dubios functions we should discuss?
>
> I'm sure there are more - but that should keep us busy for now :)
>
> Are you happy with my recent tweak to the seq3 function (CVS revision
> 1.15)? I wasn't 100% sure why it had used "Xer"
>
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py.diff?r1=1.14&r2=1.15&cvsroot=biopython
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Sicheritz-Ponten Thomas, Associate Professor, Ph.D (
Head of Metagenomics, Technical University of Denmark \
Center for Biological Sequence Analysis, BioCentrum )
CBS: +45 45 252422 Building 208, DK-2800 Lyngby ##----->
Fax: +45 45 931585 http://www.cbs.dtu.dk/~thomas )
/
... damn arrow eating trees ... (
More information about the Biopython-dev
mailing list