From dalke at acm.org  Mon Jan  8 02:57:45 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Martel-0.5
Message-ID: <001801c07948$b498e8a0$26ac323f@josiah>

New version of Martel at
  http://www.biopython.org/~dalke/Martel/Martel-0.5.tar.gz

Biggest change is support for a lot more formats.
The current list is:


 GenBank - incomplete support for release 119 (does not parse
                 some records; need to resynch with Brad's work)
 MDL_10_1996 - MDL's .mol format, as described in documentation
                    dated 10/1996 and tested against WDI and ACD
                    (thanks Daylight!)

 PDB_2_1  - tested against 1PLM and will fail against most PDB files.
 PIR_3_0  - tested against all of PIR
 blastall_2_0_10 - tested against the non-master-slave 'blastall 2.0.10
                   records from biopython.Bio.Tests.Blast (thanks Jeff!)
 blocks_12  - tested against Blocks 12.0 format
 blocksplus - tested against Blocks+ 15Nov00 format
 embl65   - tested against hum7 from EMBL Release 65, December 2000
 enzyme26  - ENZYME nomenclature database, release 26 of May 2000
 primers  - PCR primers from EBI; database date is from 1997
 prodoc16  - the format version is 16, tested against Prosite release 39
 prosite16  - the format version is 16, tested against Prosite release 39
 swissprot38 - tested against all of SWISS-PROT 38.  Pretty complete.
 taxonomy  - NCBI's taxonomy database, dated 05-NOV-1998
 trembl14  - tested against hum and mam of TrEMBL release 14, June 2000

They still need builders to create biopython data structures.

- Changes between 0.5 and 0.4

Bug fix where HeaderFooter and ParseRecords weren't copying their
subexpressions when making a .copy()

Added SignedInteger and Float definitions.

Fixed some problems with the error reporting. (Status message,
location offset, rewrote HeaderFooterParser.)

Replaced \n with \R in the various format definitions.

Added a lot of formats, although most are incomplete in that
additional fields could be parsed.

Fixed swissprot parsing so non-existant fields don't generate empty
tags.

                    Andrew


From katel at worldpath.net  Tue Jan  9 02:19:19 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Martel-0.5
References: <001801c07948$b498e8a0$26ac323f@josiah>
Message-ID: <003201c07a0c$7ea4ce20$010a0a0a@cadence.com>

   Does the current version of Martel support backtracking?  I've been
experimenting with UniGene.  Its more variable than GenBank, so backtracking
would be useful.  It could be an option if it impacts performance too much.

  The parser gets stuck on this line:
UniGene Cluster Hs.222015


  The expression is:
unigene_title = Martel.Group( "unigene_title", Martel.Str( "UniGene Cluster
" ) +
    Martel.Re( "[A-Z]" ) + Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) +
    Martel.AnyEol() )

  After this it goes into a loop until it runs out of characters.


                                              Cayte


From dalke at acm.org  Tue Jan  9 00:02:32 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Martel-0.5
Message-ID: <010701c079f9$60a33460$58ac323f@josiah>

Cayte <katel@worldpath.net>:
>   Does the current version of Martel support backtracking?

Sadly, no more than it ever did.  There is no backtracking
with the "*" operator.  I haven't been clever enough in how
to use mxTextTools to support that ability.  But so far there
have been ways around it.

>  The parser gets stuck on this line:
>UniGene Cluster Hs.222015
>
>  The expression is:
>unigene_title = Martel.Group( "unigene_title", Martel.Str(
> "UniGene Cluster " ) +
>    Martel.Re( "[A-Z]" ) + Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) +
>    Martel.AnyEol() )
>
>  After this it goes into a loop until it runs out of characters.

I can't see why it would do that there.  Every operation must
consume at least a character so it can't be stuck in an infinite
loop.  The only operator to consume newlines is the AnyEol so
at most it should read up until the end of a line.

Have you tried using the make_parser(debug_level = 2) option to
see which operation is consuming characters?

Also, you can merge the Re operations into one, as in
  Martel.Re(r"[A-Z][a-z]\.\d+") + Martel.AnyEol()

or even use \R at the end of the pattern to replace the AnyEol.

I just tested your expression out and it seems to work fine for
me.  Here's what I did:

>>> import Martel
>>> unigene_title = Martel.Group( "unigene_title",
Martel.Str( "UniGene Cluster ") + Martel.Re( "[A-Z]" ) +
Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) + Martel.AnyEol())
>>> parser = unigene_title.make_parser()
>>> from Martel.test import support
>>> parser.setContentHandler(support.Dump())
>>> parser.parseString("UniGene Cluster Hs.222015\n")
-------> Start
<unigene_title>UniGene Cluster Hs.222015
</unigene_title>
-------> End

If you still can't get it working, email me what you have and
I'll take a closer look at it.

                    Andrew


From johann at egenetics.com  Tue Jan  9 08:22:29 2001
From: johann at egenetics.com (Johann Visagie)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Martel-0.5
In-Reply-To: <001801c07948$b498e8a0$26ac323f@josiah>; from dalke@acm.org on Mon, Jan 08, 2001 at 12:57:45AM -0700
References: <001801c07948$b498e8a0$26ac323f@josiah>
Message-ID: <20010109152229.B72822@fling.sanbi.ac.za>

Hi Andrew,

Andrew Dalke on 2001-01-08 (Mon) at 00:57:45 -0700:
> 
> New version of Martel at
>   http://www.biopython.org/~dalke/Martel/Martel-0.5.tar.gz

Excellent!  Thanks!

BTW, it seems the small bugfix patch against Generate.py which you posted to
this list on 6 December did not make it into 0.5(?)

-- Johann

From dalke at acm.org  Tue Jan  9 12:31:22 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Martel-0.5
Message-ID: <00c801c07a62$55eaaa80$f4ab323f@josiah>

Johann:
>BTW, it seems the small bugfix patch against Generate.py
>which you posted to this list on 6 December did not make it into 0.5(?)

Good catch, but I actually what happened was that I withdrew
that patch because of its huge performance impact.  The details
are given in:

http://www.biopython.org/pipermail/biopython-dev/2000-December/000215.html

I pulled the patch and changed the PIR format definition (see
http://www.biopython.org/~dalke/Martel/Martel/formats/PIR_3_0.py )
to:

# The hack is to work around a Martel bug
sequence_data = Rep1(Re("   (?P<start_pos>.....)") + \
                     Rep(Group("hack", sequence_punc_letter)) + \
                     Opt(sequence_punc) + AnyEol())

My current goal is to make sure the Martel behaviour for valid
matches is a proper subset of regexp behaviour and wait for
someone to write a full engine  :)

                    Andrew
                    dalke@acm.org


From chapmanb at arches.uga.edu  Tue Jan  9 18:05:53 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Second go at GenBank parser
In-Reply-To: <Pine.GSO.4.21.0012201610060.29393-100000@riboweb.Stanford.EDU>
References: <14912.52277.659316.598153@taxus.athen1.ga.home.com>
	<Pine.GSO.4.21.0012201610060.29393-100000@riboweb.Stanford.EDU>
Message-ID: <14939.39249.775192.733606@taxus.athen1.ga.home.com>

Hi Jeff!
   Thanks for getting back with me. Sorry I have been so slow. I was
away over the break and then was so excited to code when I got back
that I jumped right into to working hard-core on Biocorba (more on
that fairly soon :-).

Jeff:
> This is great!  You've filled two gaping holes in biopython functionality.  
> Please check these in, as I'm sure people will want to start using the
> code.

Okee dokee, I would be more than happy to do this. Are there any
objections from anyone before I do it? I also am not totally clear
about where everything should go (more on that below). 

me:
> > the dreaded "fake /" cases
> > (found some more hideous ones like that in a bacterial
> > dataset). GenBank, wow, what a headache!
Jeff:
> Good.  GenBank is notoriously hard to deal with, and I suspect work on the
> format will be ongoing.

I hope so -- it will be good to get it in CVS so others can look at
it. I'm not very happy with my fix (it seems pretty inefficient to
me), but it was the best my small mind could come up with. Once it is
in there all of the brilliant minds at biopython can have a go at it :-).

me:
> > o Naming of modules -- right now my naming sucks (the "supplimentary"
> > feature classes, like Location.py and Reference.py are in a module
> > called 'FeatureInfo', for instance. yeck.), so if people have good
> > ideas for how to name things I'll definately take 'em.
Jeff:
> Are these meant to be used with SeqFeatures?  If so, how about just
> SeqFeature.Location and SeqFeature.Reference?

Do you mean put them all in the SeqFeature.py module? That sounds like 
a fine solution to me (just wanted to be sure I understand you). 

> > I'm also not sure where a good place for spark.py to live in Biopython
> > is (BTW, I think we should include it :-).
> 
> Where you have it now seems as good a place as any (without the
> PGML).  Including it is fine with me.

Okay -- I'll stick it there.

> > Finally, I noticed Jeff put his snazzy code in GenBank/__init__.py --
> > Should my GenBank.py go into __init__.py?
> 
> Yes.  GenBank is a good name for it, and as per Andrew's earlier email, we
> should avoid having code in both GenBank/__init__.py and
> GenBank/GenBank.py.

Okay, so you want me to integrate it with your __init__.py stuff? That 
is no problem just wanted to be sure. I definately want to avoid
Andrew's __init__.py/GenBank.py-type problem.

One thing -- I added the ability to index files as a Dictionary (a la
the other Parsers). Is it too confusing to having Dictionary and
NCBIDictionary in the same module? Just curious.

> Are the HTML-formatted files different?  Does it work if you just strip
> the HTML tags?  I guess for HTML-formatted data from GenBank, it would be
> nice to handle, but very low priority.  HTML-formatted data from other
> sources, no.  If someone needs that functionality, they can submit the
> patches!  :)

Most of the entry is in <pre> tags so it is not too bad, but I think
there will be some tricky issues because some of the feature names
have links in them -- this will be hard, especially considering how 
important whitespace is in the feature table. So I think I'll forgoe
this and maybe if it turns out to be easy someone else will patch it
for me :-).

Thanks again for getting back with me. I will try to write up some
docs tonight, so I hope it'll be ready to go in whenever I'm sure
where to put things :-)

Brad


From dalke at acm.org  Wed Jan 10 04:12:40 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] swissprot parsing performance comparisons
Message-ID: <000301c07ae5$905bab00$46ac323f@josiah>

My paper about Martel for the Python conference was
accepted.  One of the reviewers wanted more comparisons
with existing projects so I've been doing that.  In
case you are interested, here are the timings for parsing
SWISS-PROT release 38:

  Time  Method             Description
  ----  ------             -------------------------------
0:56.63 grep.time          grep ^ID 
1:27.89 rec_reader.time    Martel's RecordReader.StartsWith "ID"
1:47.71 swissknife.time    lazy Swissknife (doesn't parse the fields)
1:59.78 swissknife.time    ditto, to check reproducibility
6:41.14 swissprot38_no_tags.time   Martel but without tag elements
8:47.09 swissknife_id_sq.time   Swissknife, extracting ID and SQ
9:22.43 swissprot38_id_sq.time  Martel with entry_name & sequence tags
23:28.59 SwissProtBuilder.time  Martel building Biopython's SP records
28:54.69 biopython.time         Biopython building its own SP records
30:12.85 bioperl.time           Bioperl building its own SP records
38:20.65 swissknife_full.time   Swissknife with full parsing enabled

Some notes:
  - I like that the RecordReader is only 50% slower than grep!
  - Swissknife is at ftp://ftp.ebi.ac.uk/pub/software/swissprot/
  - Swissknife contains a performance problem when reading SQ
     records.  I commented out some of the problem and sent email
     to the authors about it.
  - the ID & SQ records emulates the minimum parsing needed for
     FASTA generation
  - Martel's SwissProtBuilder imports the old xml libraries and
      needed to be fixed before use.
  - swissprot38_id_sq is the same program as SwissProtBuilder but
      with all of the tags removed except for "entry_name",
      "sequence" and "swissprot38_record".  (The last is present
      as a sanity counter so I know the parse is progressing.)
      Some extra performance could be gained by making a document
      document handler which is more specific to the task.
  - the "SP records" are the existing biopython SwissProtRecord
  - fully 3/4 of SwissProtBuilder is spent in function callbacks
      for tags and in object creation and not directly in parsing.
  - the current CVS version of biopython's swissprot parser will
      not parse release 38 because it says the OX record is
      required.  Changing its "one_or_more" value to 0 fixed things.
  - bioperl and biopython likely capture somewhat different
      data so they cannot be directly compared.
  - swissknife is perhaps the least stringent parser followed by
      bioperl.  Biopython and Martel are much pickier.
      It is hard to judge if the reason for this is because of
      the natural inclination of libraries in the two languages
      because the two perl packages are from the same programming
      "culture" (EBI/Sanger) as are the two python packages
      (had the same employers at the same time)  :)


                    Andrew


From dalke at acm.org  Wed Jan 10 13:16:46 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] swissprot parsing performance comparisons
Message-ID: <006501c07b31$859321c0$46ac323f@josiah>

Jeff:
>What about PySAT?
>http://www.embl-heidelberg.de/~chenna/PySAT/

Thanks for the reminder.  I have the distribution, I'll
test it out was well.

>They have support for SwissProt, and their toolkit has been
>published.  IIRC, theirs is an example of a less stringent python
>implementation of a parser.

I recall looking at their code and I agree.  It is more like
the Swissknife way of doing things.

>This is an interesting statistic, and surprises me.  I wonder what's
>slowing the perl parser, then, since it doesn't use callbacks?

The implementions do a lot of small regex parsing.  Martel does
it all at once, and at the C level.  That might be the difference.
It is hard to tell since I would need to better understand the
details of the perl implementations.

>That's embarrassing, since it's supposedly been checked against it!  How
>many entries in release 38?  Perhaps I need to update mine.

I don't know.  I ran it and it failed at a record.  I figured out
what was wrong with that record, changed the 1 to 0, and then
everything parsed fine.

>It does seem to match the philosophies of the languages...

True enough, although as you mentioned PySAT is less stringent
and more like the Perl implementations.  Something to ponder :)

                    Andrew
                    dalke@acm.org


From katel at worldpath.net  Thu Jan 11 02:37:18 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Martel-0.5
References: <010701c079f9$60a33460$58ac323f@josiah>
Message-ID: <006301c07ba1$5450b3a0$010a0a0a@cadence.com>

----- Original Message -----
From: "Andrew Dalke" <dalke@acm.org>
> If you still can't get it working, email me what you have and
> I'll take a closer look at it.
>
  Thank you for your response.  I need to remove some extraneous prints from
the script before I send it.

  I also need to explain my approach.  Unlike Genbank, which has a fairly
consistent format, UniGene has tables, with content that  varies from record
to record.  My plan was to create a dictionary of tables.  Each entry would
contain a nested table of key value pairs, extracted from the table.  I view
the code as scaffolding until someone can suggest a better idea on how to
represent this type of table.

 My HTML stripper inserts delimitors such as ------ to preserve information
needed about the table structure.

These are the Martel definitions.

unigene_title = Martel.Group( "unigene_title", Martel.Str( "UniGene Cluster
" ) + \
    Martel.Re( "[A-Z]" ) + Martel.Re( "[a-z]" ) + Martel.Re( "\.\d+" ) + \
    Martel.AnyEol() )

descriptor = Martel.Group( "descriptor", Martel.ToEol() )
table_name = Martel.Group( "table_name", Martel.ToEol() )
start_table = Martel.Group( "start_table", Martel.Str( "^^^^^" ) +
Martel.AnyEol() )
end_table = Martel.Group( "end_table", Martel.Str( "$$$$$" ) +
Martel.AnyEol() )
start_row = Martel.Group( "start_row", Martel.Str( "=====" ) +
Martel.AnyEol() )
end_row = Martel.Group( "end_row", Martel.Str( "-----" ) + Martel.AnyEol() )
row_data = Martel.Group( "row_data", Martel.Rep1( Martel.ToEol() ) )
row = Martel.Group( "row", start_row + row_data + end_row )
table_contents = Martel.Group( "table_contents", Martel.Rep1( row ) )
unigene_table = Martel.Group( "unigene_table", table_name + start_table +
table_contents + \
    end_table )
record_contents = Martel.Group( "record_contents", Martel.Rep1(
unigene_table ) )
record_prefix = Martel.Group( "record_prefix", Martel.RepN( Martel.ToEol(),
2 ) )
unigene_record = Martel.Group( "unigene_record", record_prefix + \
    unigene_title + descriptor + record_contents )

                      Cayte


From jchang at SMI.Stanford.EDU  Fri Jan 12 01:06:11 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Second go at GenBank parser
In-Reply-To: <14939.39249.775192.733606@taxus.athen1.ga.home.com>
Message-ID: <Pine.GSO.4.31.0101112201380.8076-100000@taiyang>

On Tue, 9 Jan 2001, Brad Chapman wrote:

> Hi Jeff!
>    Thanks for getting back with me. Sorry I have been so slow. I was
> away over the break and then was so excited to code when I got back
> that I jumped right into to working hard-core on Biocorba (more on
> that fairly soon :-).

Yep, I know how that is.  I'm just getting back from break and am catching
up on my email.


> me:
> > > o Naming of modules -- right now my naming sucks (the "supplimentary"
> > > feature classes, like Location.py and Reference.py are in a module
> > > called 'FeatureInfo', for instance. yeck.), so if people have good
> > > ideas for how to name things I'll definately take 'em.
> Jeff:
> > Are these meant to be used with SeqFeatures?  If so, how about just
> > SeqFeature.Location and SeqFeature.Reference?
>
> Do you mean put them all in the SeqFeature.py module? That sounds like
> a fine solution to me (just wanted to be sure I understand you).

I think so.  If these are all classes that are to be used with
SeqFeatures, then that seems like a reasonable place to put it.


> > > Finally, I noticed Jeff put his snazzy code in GenBank/__init__.py --
> > > Should my GenBank.py go into __init__.py?
> >
> > Yes.  GenBank is a good name for it, and as per Andrew's earlier email, we
> > should avoid having code in both GenBank/__init__.py and
> > GenBank/GenBank.py.
>
> Okay, so you want me to integrate it with your __init__.py stuff? That
> is no problem just wanted to be sure. I definately want to avoid
> Andrew's __init__.py/GenBank.py-type problem.

Yep, integrate away!


> One thing -- I added the ability to index files as a Dictionary (a la
> the other Parsers). Is it too confusing to having Dictionary and
> NCBIDictionary in the same module? Just curious.

It's fine with me.  Anyone else?


Jeff


From dalke at acm.org  Tue Jan 16 20:57:23 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Martel debug_level = 2 bug
Message-ID: <002601c08028$d807a240$81ab323f@josiah>

There's a small bug with the debug_level = 2 option
in Martel.  When the debug position is within the
first 8 characters it does not show the match text.
Here's the context diff for a patch.

*** Generate.py.orig    Tue Jan 16 20:41:48 2001
--- Generate.py Tue Jan 16 12:32:25 2001
***************
*** 460,466 ****
              s = s[:17] + " ... " + s[-17:]
          self.msg = s
      def __call__(self, text, x, end):
!         print "Match %s (x=%d): %s" % (repr(text[x-8:x+8]), x,
                                              repr(self.msg))
          return x

--- 460,466 ----
              s = s[:17] + " ... " + s[-17:]
          self.msg = s
      def __call__(self, text, x, end):
!         print "Match %s (x=%d): %s" % (repr(text[max(0, x-8):x+8]), x,
                                              repr(self.msg))
          return x


Pretty basic problem, which points out the usability problem
in having negative ranges mean something.

Really, this output should be improved to be more descriptive.
I have problems figuring out which character is the current
debug position because things like "\012" (newline) add
characters to the string so it isn't always in the same place.

The current output looks like:
Match 'Q63631;\012' (x=29): '(?P<AC>AC   (?P<a ... +)\\;)*(\\n|\\r\\n?))'

The "Match " is present so there is a well definable piece of
text to key off of, which is important when there is other debug
output.

The second field is the 8 characters +/- the scan position.
My problem is I don't know where that position is without
counting manually, and I don't remember if it's 8 characters or
7 or what.  

The third field is the character position in the string.
It is "(x=29)" for this case, but should probably be "pos=29"
to be more understandable.  (The internal variable name is
x but should also likely be "pos".)

The 4th and last field is the string representation of the
part of the regular expression that matched.  It is at most
40 characters.  If the field is greater than 40 characters,
the first and last 17 characters are used and the " ... "
inserted as a marker for the missing text, as you see above.

I chose 40 characters since that seems to keep the character
count under 80 columns.

There are a couple of ways to change this.  I could break the
match text into two parts, to make it easy to find where the
pre and post parts are, as in

Match 'Q636' '31;\012' pos=24 '[0-9]'

or I could use two lines of information, like

Match 'Q63631;\012' pos=29 '(?P<AC>AC   (?P<a ... +)\\;)*(\\n|\\r\\n?))'
              ^

With two lines I could make the repr of the regex longer, which
provides more context to the match, as in

Match 'Q63631;\012' pos=29 '(?P<AC>AC   (?P<abcdef>(\w+))some [0-9]'
              ^            'text (\\n|\\r\\n?))'

This probably is the most useful, although it takes up twice
as much space.

Brad, you use this debug level a lot.  What are your thoughts on
usability?

                    Andrew


From dalke at acm.org  Wed Jan 17 03:51:44 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] neat Martel trick
Message-ID: <009e01c08062$b8e2de40$c6ab323f@josiah>

Forgot to point out the prodoc parser in the 0.5 release
has a neat trick.

> This parser does extra work to identify [footnote,records], <PRODOC>
> and (EC) links, which has about an extra 50% impact on performance (80
> seconds instead of 50).


reference_note = Martel.Group("reference_note",
                 Martel.Re("\[(?P<note>E?\d+)(,(?P<note>E?\d+))*\]"))

prodoc_link = Martel.Group("prodoc_link",
             Martel.Str("<") + prodoc_num + Martel.Str(">"))

# This is incomplete and doesn't allow things like 1.1.-.-
ec_link = Martel.Group("ec_link",
    Martel.Re("\(EC *(?P<ec_number>[1-9][0-9]*\.[1-9][0-9]*" \
              "\.[1-9][0-9]*\.([1-9][0-9]*|-))\)"))

generic_text = reference_note | \
               prodoc_link | \
               ec_link | \
               Martel.Re("[^\R]")

What this all says is, if a given character position starts
a substring looking like:
 -- "[1]" or "[2,3]" then it's a footnote reference
 -- "<PDOC00001>" then it's a PRODOC reference
 -- "(EC 1.2.3.4)" then it's an E.C. number
otherwise that character is just a regular character.

That this *does* is provide a way to mark up semi-free form
text by providing detection of certain items.  For example,
I could also have had a pattern for http links, or email
addresses, or ...

In other words, the line

This is similar to <PDOC12345> which talks about (EC 1.2.3.-).

gets parsed as

This is similar to <prodoc_link>&lt;<prodoc_num>PDOC12345
</prodoc_num>&gt;</prodoc_link> which talks about <ec_link>(
<ec_number>EC 1.2.3.-</ec_number>)</ec_link>.

which can easily be turned into links to other databases
or for [] footnotes, into #relative references to the current
page.

I've done this by hand and it's hard because you don't want
to replace text which was already replaced once before.
Guess I should have included it in the Python paper, but it
was getting long.  Perhaps at the conference.

                    Andrew
                    dalke@acm.org


From katel at worldpath.net  Thu Jan 18 03:09:36 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] UniGene parser
Message-ID: <001b01c08126$005848a0$010a0a0a@cadence.com>

  I just added a parser for UniGene.  I used Martel in my first attempt, but
after a discussion with Andrew, I decided to user sgmllib.  Martel is
appropriate for flat text, but sgmllib is better when html is used
extensively..

  I hope to add test code, this weekend, during the snow storm.

                                      Cayte


From dalke at acm.org  Sat Jan 20 20:01:15 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] move Martel CVS to biopython
Message-ID: <008a01c08345$a8ffeaa0$ffab323f@josiah>

Hey all,

  The Martel changes now are all very minor so I want
to move the CVS tree for Martel over to the biopython
tree.  As I mentioned some time ago, I would like to
have it be parallel to the "Bio" level and set up to
be distributable on its own.

  What are the problems with that?  One is that it
would be nice if the biopython distribution only
installs one additional name to the module import
namespace.  Another is that it complicates a biopython
install by either automatically installing another
package (since Martel could already exist) or by
requiring another dependency.

  I don't think either of those are serious, but
are there any other reasonable objections?

  Also, I'm told that I can just drop in the whole Martel
CVS tree into the biopython CVS tree and have it all
work.  Anyone ever done that?  I'll experiment with it
anyway before actually doing it.

                    Andrew
                    dalke@acm.org


From katel at worldpath.net  Mon Jan 22 03:42:58 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Genbank
Message-ID: <001101c0844f$535d9a60$010a0a0a@cadence.com>

  As we bundle packages like Martel and TextTools, we need to agree on a
standard for pathing.  When I tried to run test_genbank.py, python did not
find them because they were not on PYTHONPATH.  My method is to store the in
a folder under Python20Lib and use a prefix

 We need to agree on some cookbook rules.

                           Cayte


From johann at egenetics.com  Mon Jan 22 11:15:06 2001
From: johann at egenetics.com (Johann Visagie)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] move Martel CVS to biopython
In-Reply-To: <008a01c08345$a8ffeaa0$ffab323f@josiah>; from dalke@acm.org on Sat, Jan 20, 2001 at 06:01:15PM -0700
References: <008a01c08345$a8ffeaa0$ffab323f@josiah>
Message-ID: <20010122181506.A87578@fling.sanbi.ac.za>

Andrew Dalke on 2001-01-20 (Sat) at 18:01:15 -0700:
> 
>   The Martel changes now are all very minor so I want
> to move the CVS tree for Martel over to the biopython
> tree.  As I mentioned some time ago, I would like to
> have it be parallel to the "Bio" level and set up to
> be distributable on its own.
> 
>   What are the problems with that?

First, a question:  Will you still distribute Martel as a separate package,
or do you envisage it only to be available as part of BioPython?

If it will still be available independently, then I suggest that there be an
option not to install Martel when installing BioPython.  This would make life
easier for those of us trying to maintain Martel and/or BioPython in the
packaging schemas of various open source operating systems, since it would
then be possible to enforce the dependency externally to distutils and avoid
conflicts (given that "standalone" Martel and the Martel included with
BioPython remain in synchronisation).

-- Johann

From dalke at acm.org  Mon Jan 22 11:46:49 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] move Martel CVS to biopython
Message-ID: <015201c08492$f0e4ec00$eaab323f@josiah>

Johann:
>Will you still distribute Martel as a separate package,
>or do you envisage it only to be available as part of BioPython?

I would rather it be distributed independent from Biopython.

I am worried about the increased barrier to entry because it
means more work will need to be done to get Biopython installed.
I realize with packaging systems that barrier is very small,
but I have no feel for what most people do to install a package.

The location of Martel in the CVS repository (as its own tree
or as a branch of the main biopython tree) makes only a
small difference on how it is distributed.  As part of the
main tree it is easier to distribute as one large package,
but it is still simple to distibute as different packages.

So I don't think wherever Martel is placed in CVS will have
an impact on how it is distributed.

Upon further reflection, if there are people interested in
Martel development by not Biopython development, then placing
it under the biopython CVS tree is a bit a problem because
authentication is on the tree level, and not the branch one.
But I can't imagine anyone working on Martel who would need
that sort of forced restraint to keep from working on other
code.

There doesn't seem to be any real objection to placing the
Martel CVS development in $CVSROOT/biopython/Martel, right?

                    Andrew


From jchang at SMI.Stanford.EDU  Mon Jan 22 14:12:48 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] move Martel CVS to biopython
In-Reply-To: <015201c08492$f0e4ec00$eaab323f@josiah>
Message-ID: <Pine.GSO.4.21.0101221108180.2013-100000@riboweb.Stanford.EDU>

On Mon, 22 Jan 2001, Andrew Dalke wrote:

> Upon further reflection, if there are people interested in
> Martel development by not Biopython development, then placing
> it under the biopython CVS tree is a bit a problem because
> authentication is on the tree level, and not the branch one.
> But I can't imagine anyone working on Martel who would need
> that sort of forced restraint to keep from working on other
> code.

I'm almost completely sure you're aware of this, but I should point out
that the reverse is true as well; everyone working on biopython code will
have access to Martel stuff as well.  Make sure you're comfortable with
this before checking stuff in.


> There doesn't seem to be any real objection to placing the
> Martel CVS development in $CVSROOT/biopython/Martel, right?

Fine with me.

Jeff


From johann at egenetics.com  Tue Jan 23 07:20:46 2001
From: johann at egenetics.com (Johann Visagie)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] move Martel CVS to biopython
In-Reply-To: <015201c08492$f0e4ec00$eaab323f@josiah>; from dalke@acm.org on Mon, Jan 22, 2001 at 09:46:49AM -0700
References: <015201c08492$f0e4ec00$eaab323f@josiah>
Message-ID: <20010123142046.D52192@fling.sanbi.ac.za>

Andrew Dalke on 2001-01-22 (Mon) at 09:46:49 -0700:
> 
> I would rather it be distributed independent from Biopython.

Moving somewhat O/T, I feel Martel also has many potential uses outside of
bioinformatics, and I have been doing my share of Martel advocacy amongst
non-bio developers.  ;-)

-- Johann

From katel at worldpath.net  Wed Jan 24 02:00:33 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Kabat
References: <Pine.GSO.4.21.0101221108180.2013-100000@riboweb.Stanford.EDU>
Message-ID: <001e01c085d3$5c5bfd20$010a0a0a@cadence.com>

   I plan to work on the Kabat format next.  It looks like a good candidate
for Martel.  We can avoid duplication of effort by posting before we start a
project.

                                             Cayte


From jchang at SMI.Stanford.EDU  Wed Jan 24 01:00:48 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Kabat
In-Reply-To: <001e01c085d3$5c5bfd20$010a0a0a@cadence.com>
Message-ID: <Pine.GSO.4.31.0101232200370.5183-100000@taiyang>

Great!  Looking forward to seeing the results.

Jeff


On Tue, 23 Jan 2001, Cayte wrote:

>    I plan to work on the Kabat format next.  It looks like a good candidate
> for Martel.  We can avoid duplication of effort by posting before we start a
> project.
>
>                                              Cayte
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
>


From chapmanb at arches.uga.edu  Wed Jan 24 05:58:51 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] move Martel CVS to biopython
In-Reply-To: <Pine.GSO.4.21.0101221108180.2013-100000@riboweb.Stanford.EDU>
References: <015201c08492$f0e4ec00$eaab323f@josiah>
	<Pine.GSO.4.21.0101221108180.2013-100000@riboweb.Stanford.EDU>
Message-ID: <14958.46443.330465.755842@taxus.athen1.ga.home.com>

Jeffrey Chang writes:
[Andrew muses about how Martel developers will have access to
Biopython code]
> I'm almost completely sure you're aware of this, but I should point out
> that the reverse is true as well; everyone working on biopython code will
> have access to Martel stuff as well.  Make sure you're comfortable with
> this before checking stuff in.

Hee hee! I'm finally going to have my chance to mess with the
Martel code base. All of my evil plans can now begin to take shape :-).

Andrew:
> > There doesn't seem to be any real objection to placing the
> > Martel CVS development in $CVSROOT/biopython/Martel, right?
> 
> Fine with me.

With me as well.

In terms of distribution, if we decide to put Martel inside the
Biopython tree, can we also distribution Martel with Biopython, in
addition to independently? I'm not sure if this was already the plan
or not, but seems to make sense to me, especially if it going to be in 
the same CVS tree.

If we do this, then one potential problem is making life difficult for 
OSes with decent systems for installing dependencies, like Johann
was worried about for FreeBSD. In my vision, we could have a flag in
setup.py like install_martel = 1, which would be on by default. This
could then be set to zero if you want to install Martel from its
independent package separately.

Is this what people were imagining, or am I completely off-target
here?

Brad


From chapmanb at arches.uga.edu  Wed Jan 24 06:05:08 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Genbank
In-Reply-To: <001101c0844f$535d9a60$010a0a0a@cadence.com>
References: <001101c0844f$535d9a60$010a0a0a@cadence.com>
Message-ID: <14958.46820.592004.160068@taxus.athen1.ga.home.com>

Cayte:
>   As we bundle packages like Martel and TextTools, we need to agree on a
> standard for pathing.  When I tried to run test_genbank.py, python did not
> find them because they were not on PYTHONPATH.  My method is to store the in
> a folder under Python20Lib and use a prefix
> 
>  We need to agree on some cookbook rules.

This is a good point, but I think it will mostly taken care of by
Martel and mxTextTools moving to use Distutils for installation. In
this case, these will be installed to a directory on the PYTHONPATH by 
default, and if the person installing them puts them elsewhere on
their system, then they need to change their PYTHONPATH or whatever.

BTW, Windows is really the only platform without a good default place
to put third party modules. Both UNIX and Macintosh use the
lib/site-packages directory to put these modules. This is more a
general python on Windows problem (IMHO) then a Biopython specific 
problem, though.

Brad


From jchang at SMI.Stanford.EDU  Wed Jan 24 12:08:48 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] move Martel CVS to biopython
In-Reply-To: <14958.46443.330465.755842@taxus.athen1.ga.home.com>
Message-ID: <Pine.GSO.4.21.0101240906320.28385-100000@riboweb.Stanford.EDU>

On Wed, 24 Jan 2001, Brad Chapman wrote:

> In my vision, we could have a flag in setup.py like install_martel =
> 1, which would be on by default. This could then be set to zero if you
> want to install Martel from its independent package separately.

This would be a nice feature to have, if someone had the time and
inclination to do it.  Otherwise, it would probably be OK for the setup.py
script just to check for dependencies and their version numbers, and
complain if the user needs to install/upgrade.

Jeff


From chapmanb at arches.uga.edu  Wed Jan 24 17:22:11 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Biopython-corba and CVS
Message-ID: <14959.21907.622852.186541@taxus.athen1.ga.home.com>

Hi all!
   
I have a question about biopython-corba and CVS. I've mucked up the
current CVS directory for biopython-corba pretty badly. I was pretty
inexperienced with CORBA and other things when I started, and ended up 
making lots of directory changes which makes things really complicated 
in there, with lots of empty directories and dead files. It is pretty
ugly. 

I was thinking that it might be better to move the current
biopython-corba module to something like biopython-corba-old and start 
fresh with the new biopython-corba release. I know this is really bad
CVS practice, but it seems better to me then the current mess I've
got in CVS.

What do people think about this? Any other suggestions? 

Brad


From chapmanb at arches.uga.edu  Sat Jan 27 12:49:59 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Working on Sequence deprecation
Message-ID: <14963.2631.439456.985606@taxus.athen1.ga.home.com>

Hello all;
I was working some this morning on deprecating Sequence.py (in favor
of Andrew's Seq.py), which I think is on our to-do list for the next
release. 

I'd done a little bit of work on this earlier on Fasta.py, and I
completed the job this morning and checked it in along with tests. I
then grepped for other stuff that uses Sequence.py, and came up with:

o Rebase and Gobase -- These contain SequenceParser classes, but
either these are left over from a copy and paste or the
_SequenceConsumer classes haven't been written yet, I guess. What
is the plan for these? It doesn't seem like the data really fits into
a sequence class, but I'm not sure.

o SwissProt -- I changed the SequenceParser to a simple
implementation that uses the SeqRecord and Seq classes. I didn't
really go into anything complicated like SeqFeatures yet. 
The context diff for this is attached. It also has a fix for OX
lines, which I think actually fixes my previous patch. I didn't
realize there wasn't a test for SProt before in the regression tests,
so my previous test didn't handle OX lines correctly on older files
(ie. it bombs out if there isn't an OX line. I think the new one does
it right). Sorry about that, I think this might have been
the problem Andrew was talking about in his Martel tests.

I think this is it, and then nothing will use Sequence.py. Pretty
exciting! What do people think? Ready for Sequence.py to go so we only 
have one sequence class?

Additionally, have we also thought about getting rid of the SeqIO
directory? I think the current Fasta.py will do everything this does
right now, so we might not need it any more. What do people think?

Brad

-------------- next part --------------
*** SProt.py.orig	Wed Nov 29 19:37:27 2000
--- SProt.py	Sat Jan 27 12:35:14 2001
***************
*** 20,30 ****
  Dictionary         Accesses a SwissProt file using a dictionary interface.
  ExPASyDictionary   Accesses SwissProt records from ExPASy.
  RecordParser       Parses a SwissProt record into a Record object.
! SequenceParser     Parses a SwissProt record into a Sequence object.
  
  _Scanner           Scans SwissProt-formatted data.
  _RecordConsumer    Consumes SwissProt data to a Record object.
! _SequenceConsumer  Consumes SwissProt data to a Sequence object.
  
  
  Functions:
--- 20,30 ----
  Dictionary         Accesses a SwissProt file using a dictionary interface.
  ExPASyDictionary   Accesses SwissProt records from ExPASy.
  RecordParser       Parses a SwissProt record into a Record object.
! SequenceParser     Parses a SwissProt record into a Seq object.
  
  _Scanner           Scans SwissProt-formatted data.
  _RecordConsumer    Consumes SwissProt data to a Record object.
! _SequenceConsumer  Consumes SwissProt data to a Seq object.
  
  
  Functions:
***************
*** 36,42 ****
  import string
  from Bio import File
  from Bio import Index
! from Bio import Sequence
  from Bio.ParserSupport import *
  from Bio.WWW import ExPASy
  from Bio.WWW import RequestLimiter
--- 36,44 ----
  import string
  from Bio import File
  from Bio import Index
! from Bio import Alphabet
! from Bio import Seq
! from Bio import SeqRecord
  from Bio.ParserSupport import *
  from Bio.WWW import ExPASy
  from Bio.WWW import RequestLimiter
***************
*** 288,299 ****
          return self._consumer.data
  
  class SequenceParser:
!     """Parses SwissProt data into a Sequence object.
  
      """
!     def __init__(self):
          self._scanner = _Scanner()
!         self._consumer = _SequenceConsumer()
  
      def parse(self, handle):
          self._scanner.feed(handle, self._consumer)
--- 290,307 ----
          return self._consumer.data
  
  class SequenceParser:
!     """Parses SwissProt data into a Seq object.
  
      """
!     def __init__(self, alphabet = Alphabet.generic_protein):
!         """Initialize a RecordParser.
! 
!         Arguments:
!         o alphabet - The alphabet to use for the generated Seq objects. If
!         not supplied this will default to the generic protein alphabet.
!         """
          self._scanner = _Scanner()
!         self._consumer = _SequenceConsumer(alphabet)
  
      def parse(self, handle):
          self._scanner.feed(handle, self._consumer)
***************
*** 390,396 ****
  
      def _scan_ox(self, uhandle, consumer):
          self._scan_line('OX', uhandle, consumer.taxonomy_id,
!                         one_or_more=1)
  
      def _scan_reference(self, uhandle, consumer):
          while 1:
--- 398,404 ----
  
      def _scan_ox(self, uhandle, consumer):
          self._scan_line('OX', uhandle, consumer.taxonomy_id,
!                         any_number=1)
  
      def _scan_reference(self, uhandle, consumer):
          while 1:
***************
*** 712,728 ****
              setattr(ref, m, string.rstrip(attr))
  
  class _SequenceConsumer(AbstractConsumer):
!     """Consumer that converts a SwissProt record to a Sequence object.
  
      Members:
!     data    Record with SwissProt data.
  
      """
!     def __init__(self):
          self.data = None
          
      def start_record(self):
!         self.data = Sequence.NamedSequence(Sequence.Sequence())
          
      def end_record(self):
          pass
--- 720,746 ----
              setattr(ref, m, string.rstrip(attr))
  
  class _SequenceConsumer(AbstractConsumer):
!     """Consumer that converts a SwissProt record to a Seq object.
  
      Members:
!     data      Record with SwissProt data.
!     alphabet  The alphabet the generated Seq objects will have.
  
      """
!     def __init__(self, alphabet = Alphabet.generic_protein):
!         """Initialize a Sequence Consumer
! 
!         Arguments:
!         o alphabet - The alphabet to use for the generated Seq objects. If
!         not supplied this will default to the generic protein alphabet.
!         """
          self.data = None
+         self.alphabet = alphabet
          
      def start_record(self):
!         seq = Seq.Seq("", self.alphabet)
!         self.data = SeqRecord.SeqRecord(seq)
!         self.data.description = ""
          
      def end_record(self):
          pass
***************
*** 730,738 ****
      def identification(self, line):
          cols = string.split(line)
          self.data.name = cols[1]
          
      def sequence_data(self, line):
!         seq = string.rstrip(string.replace(line, " ", ""))
          self.data.seq = self.data.seq + seq
  
  def index_file(filename, indexname, rec2key=None):
--- 748,765 ----
      def identification(self, line):
          cols = string.split(line)
          self.data.name = cols[1]
+ 
+     def accession(self, line):
+         ids = string.split(string.rstrip(line[5:]), ';')
+         self.data.id = ids[0]
+ 
+     def description(self, line):
+         self.data.description = self.data.description + \
+                                 string.strip(line[5:]) + "\n"
          
      def sequence_data(self, line):
!         seq = Seq.Seq(string.rstrip(string.replace(line, " ", "")),
!                       self.alphabet)
          self.data.seq = self.data.seq + seq
  
  def index_file(filename, indexname, rec2key=None):
From jchang at SMI.Stanford.EDU  Sat Jan 27 21:12:41 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Working on Sequence deprecation
In-Reply-To: <14963.2631.439456.985606@taxus.athen1.ga.home.com>
Message-ID: <Pine.GSO.4.31.0101271758260.15278-100000@taiyang>

On Sat, 27 Jan 2001, Brad Chapman wrote:

> Hello all;
> I was working some this morning on deprecating Sequence.py (in favor
> of Andrew's Seq.py), which I think is on our to-do list for the next
> release.

Yes, great job!  Thanks a lot for doing this.

> I'd done a little bit of work on this earlier on Fasta.py, and I
> completed the job this morning and checked it in along with tests.

Good.  Peter Wilkinson is working on a solution for parsing the
description line better, with support for NCBI's formats.  This stuff
should be coming in in the next few weeks or so.


> I then grepped for other stuff that uses Sequence.py, and came up
> with:
>
> o Rebase and Gobase -- These contain SequenceParser classes, but
> either these are left over from a copy and paste or the
> _SequenceConsumer classes haven't been written yet, I guess. What
> is the plan for these? It doesn't seem like the data really fits into
> a sequence class, but I'm not sure.

Don't know.  These seem to be stub classes that haven't been implemented
yet.  What are your plans for them, Cayte?  Are they removeable?


> o SwissProt -- I changed the SequenceParser to a simple
> implementation that uses the SeqRecord and Seq classes. I didn't
> really go into anything complicated like SeqFeatures yet.
> The context diff for this is attached. It also has a fix for OX
> lines, which I think actually fixes my previous patch. I didn't
> realize there wasn't a test for SProt before in the regression tests,
> so my previous test didn't handle OX lines correctly on older files
> (ie. it bombs out if there isn't an OX line. I think the new one does
> it right). Sorry about that, I think this might have been
> the problem Andrew was talking about in his Martel tests.


Great.  Please check your patch in.

> I think this is it, and then nothing will use Sequence.py. Pretty
> exciting! What do people think? Ready for Sequence.py to go so we only
> have one sequence class?

Yes, definitely exciting.  It'll be nice to have a common object that all
these different formats can map into.


> Additionally, have we also thought about getting rid of the SeqIO
> directory? I think the current Fasta.py will do everything this does
> right now, so we might not need it any more. What do people think?

Let's leave the directory in for now, for future expansion.  Eventually,
we should have a function (like bioperl's) where the user can point SeqIO
to a sequence file, and it'll figure out how to parse it into a Seq class.

Jeff


From jchang at SMI.Stanford.EDU  Sat Jan 27 21:17:04 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Biopython-corba and CVS
In-Reply-To: <14959.21907.622852.186541@taxus.athen1.ga.home.com>
Message-ID: <Pine.GSO.4.31.0101271813160.15278-100000@taiyang>

Since nobody else has posted an opinion to the group, I'd like to say that
it's normal for CVS repositories to degrade over time as designs get
changed.  It sounds like it's starting to get hard to manage, so I say go
for it.  :)

Jeff


On Wed, 24 Jan 2001, Brad Chapman wrote:

> Hi all!
>
> I have a question about biopython-corba and CVS. I've mucked up the
> current CVS directory for biopython-corba pretty badly. I was pretty
> inexperienced with CORBA and other things when I started, and ended up
> making lots of directory changes which makes things really complicated
> in there, with lots of empty directories and dead files. It is pretty
> ugly.
>
> I was thinking that it might be better to move the current
> biopython-corba module to something like biopython-corba-old and start
> fresh with the new biopython-corba release. I know this is really bad
> CVS practice, but it seems better to me then the current mess I've
> got in CVS.
>
> What do people think about this? Any other suggestions?
>
> Brad
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
>


From chapmanb at arches.uga.edu  Sun Jan 28 10:58:35 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Working on Sequence deprecation
In-Reply-To: <Pine.GSO.4.31.0101271758260.15278-100000@taiyang>
References: <14963.2631.439456.985606@taxus.athen1.ga.home.com>
	<Pine.GSO.4.31.0101271758260.15278-100000@taiyang>
Message-ID: <14964.16811.501068.654047@taxus.athen1.ga.home.com>

Hi Jeff!

[Talking about Sequence.py deprecation]
> > I'd done a little bit of work on this earlier on Fasta.py, and I
> > completed the job this morning and checked it in along with tests.
> 
> Good.  Peter Wilkinson is working on a solution for parsing the
> description line better, with support for NCBI's formats.  This stuff
> should be coming in in the next few weeks or so.

Sweet! I'll be ready happy to see that -- it would be nice to have some 
robust functions for getting info from "standard" FASTA title lines.

[SProt.py patch]
> Great.  Please check your patch in.

Okee dokee. I also added a quick test for SProt.py, so that I
hopefully won't break anything in the future, and added a test case
that uses the 'OX' line.

> > Additionally, have we also thought about getting rid of the SeqIO
> > directory? I think the current Fasta.py will do everything this does
> > right now, so we might not need it any more. What do people think?
> 
> Let's leave the directory in for now, for future expansion.  Eventually,
> we should have a function (like bioperl's) where the user can point SeqIO
> to a sequence file, and it'll figure out how to parse it into a Seq class.

Cool, that makes good sense.

BTW, Jeff, I was going to nominate you on the main list to be the
biopython representative for BOSC 2001 -- do you want that
responsibility? I know you are just the man to do it, but I didn't
want to give you stuff to do you don't want! 

Brad


From chapmanb at arches.uga.edu  Sun Jan 28 11:00:58 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Biopython-corba and CVS
In-Reply-To: <Pine.GSO.4.31.0101271813160.15278-100000@taiyang>
References: <14959.21907.622852.186541@taxus.athen1.ga.home.com>
	<Pine.GSO.4.31.0101271813160.15278-100000@taiyang>
Message-ID: <14964.16954.195447.79210@taxus.athen1.ga.home.com>

[I talk about mucking up the biopython-corba CVS repository]

Jeff:
> Since nobody else has posted an opinion to the group, I'd like to say that
> it's normal for CVS repositories to degrade over time as designs get
> changed.  It sounds like it's starting to get hard to manage, so I say go
> for it.  :)

Okee dokee. I just imported the new stuff as biopython-corba, and
moved the old one to biopython-corba-0.1 (which I guess is also handy
if people want 0.1 support). Hopefully I can keep myself from messing
up this clean new CVS directory :-).

Brad


From katel at worldpath.net  Mon Jan 29 02:12:08 2001
From: katel at worldpath.net (Cayte)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Working on Sequence deprecation
References: <14963.2631.439456.985606@taxus.athen1.ga.home.com>
Message-ID: <001401c089c2$cc234c60$010a0a0a@cadence.com>

> o Rebase and Gobase -- These contain SequenceParser classes, but
> either these are left over from a copy and paste or the
> _SequenceConsumer classes haven't been written yet, I guess. What
> is the plan for these? It doesn't seem like the data really fits into
> a sequence class, but I'm not sure.
>
>
  No plan, I was just scared to remove them, just in case.  I can remove
them.  Unless someone suggests a better plan?

                                 Cayte


From chapmanb at arches.uga.edu  Tue Jan 30 19:15:27 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:42:56 2005
Subject: [Biopython-dev] Working on Sequence deprecation
In-Reply-To: <001401c089c2$cc234c60$010a0a0a@cadence.com>
References: <14963.2631.439456.985606@taxus.athen1.ga.home.com>
	<001401c089c2$cc234c60$010a0a0a@cadence.com>
Message-ID: <14967.22815.497102.762668@taxus.athen1.ga.home.com>

Hi Cayte!

[I was wondering about the status of Rebase and Gobase Sequence parsers]

Cayte:
>   No plan, I was just scared to remove them, just in case.  I can remove
> them.  Unless someone suggests a better plan?

Removing them sounds fine to me -- it would be no big deal to add them 
later, if there was a need.

So +1 on removing them from me.

Brad