[Biopython-dev] BioPython Design

Colosimo, Marc E. mcolosimo at mitre.org
Sun Jul 2 15:12:23 EDT 2006


Michiel,

When will this next release be made and what is going into it?

Since you brought up the issue of design question, I'll have my little rant
now. But first, I would like to say that I think it is great that people
contribute code and more importantly their time to this project. With out
all of the core developers there would be no BioPython. So, Kudos to anyone
who has contribute code. Now on to my rant....

<rant>
I'm not a big user of either BioPerl or BioJava. However, they are well
structured and more consistent than BioPython.This FastaIO issue is one of
several design issues that really need to be addressed.

For example, both BioPerl and BioJava use an SeqIO object structure. Our
SeqIO module is heavily underused. For example, we have Fasta, GenBank,
LocusLink, NBRF, SwissProt, UniGene main Modules. Interestingly, there is a
writers.SeqRecord.embl but I can't quickly find something to read in an embl
file! 

Just look at what BioPerl can read in
<http://www.bioperl.org/wiki/HOWTO:SeqIO> and how easy it is to find this
out (even with out the doc page, all of these are listed under
Bio::SeqIO::*)

There is a very short "Coding Convention"
<http://biopython.org/wiki/Contributing#Coding_conventions>, which doesn't
seem to be followed all that well.

My suggestion is if enough people are going to ISMB this year (which I am
not), that time should be made to think about a road map for BioPython.

My suggestions are:
1) split off a branch for ver 2.0 that supports Python 2.4 only (this would
suck for Mac people, like me, but its time to move on)
2) clean house - remove depreciated items, restructure IO, etc...
3) move to SciPy/NumPy verse Numeric (could try "numpy/lib/convertcode.py")
4) use Cheese Shop for missing modules
5) documentation

</rant>

marc

On 7/2/06 12:43 AM, "Michiel de Hoon" <mdehoon at c2b2.columbia.edu> wrote:

> Thanks Iddo!
> I tried the parser in Bio.SeqIO.FASTA and it is indeed a lot faster than
> the Martel-based one in Bio.Fasta.
> 
> It would be nice to merge these two modules. However, it raises a bunch
> of design questions (such as Fasta.Record versus SeqRecord, and Seq
> versus string), so it's probably better to wait with that until after
> the next Biopython release. Which, by the way, will be coming up soon.
> 
> Thanks,
> 
> --Michiel.
> 
> Iddo Friedberg wrote:
>> Michiel,
>> 
>> There is actually a simple minded fasta reader/writer  that does not use
>> Martel. Bio.SeqIO.FASTA
>> 
>> ./I
>> 
>> --
>> Iddo Friedberg, PhD
>> Burnham Institute for Medical Research
>> 10901 N. Torrey Pines Rd.
>> La Jolla, CA 92037 USA
>> T: +1 858 646 3100 x3516
>> http://iddo-friedberg.org
>> http://BioFunctionPrediction.org
>> 
>> 
>> 
>> -----Original Message-----
>> From: biopython-dev-bounces at lists.open-bio.org on behalf of Michiel de Hoon
>> Sent: Sat 7/1/2006 2:47 PM
>> To: biopython-dev at biopython.org
>> Subject: [Biopython-dev] Fasta parser
>> 
>> Hi everybody,
>> 
>> The Biopython shows the following approach to parsing a Fasta file:
>> 
>>>>> from Bio import Fasta
>>>>> parser = Fasta.RecordParser()
>>>>> file = open("ls_orchid.fasta")
>>>>> iterator = Fasta.Iterator(file, parser)
>>>>> cur_record = iterator.next()
>> 
>> But for large Fasta files, it's very slow, compared to file.read(),
>> which may be due to going through Martel (I believe the same was true
>> for large GenBank files).
>> 
>> So I'm thinking about writing a simple-minded Fasta parser for better
>> performance with large files. What I'm wondering about:
>> 1) Is there some advantage that I overlooked of using Martel for parsing
>> Fasta files?
>> 2) Why is it necessary to create a parser first and passing it to
>> Fasta.Iterator? Are there any cases where Fasta.Iterator uses something
>> other than a Fasta.RecordParser?
>> 
>> --Michiel.
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>> 
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list