[Biopython-dev] Working on Sequence deprecation

Sat Jan 27 12:49:59 EST 2001

Hello all;
I was working some this morning on deprecating Sequence.py (in favor
of Andrew's Seq.py), which I think is on our to-do list for the next
release. 

I'd done a little bit of work on this earlier on Fasta.py, and I
completed the job this morning and checked it in along with tests. I
then grepped for other stuff that uses Sequence.py, and came up with:

o Rebase and Gobase -- These contain SequenceParser classes, but
either these are left over from a copy and paste or the
_SequenceConsumer classes haven't been written yet, I guess. What
is the plan for these? It doesn't seem like the data really fits into
a sequence class, but I'm not sure.

o SwissProt -- I changed the SequenceParser to a simple
implementation that uses the SeqRecord and Seq classes. I didn't
really go into anything complicated like SeqFeatures yet. 
The context diff for this is attached. It also has a fix for OX
lines, which I think actually fixes my previous patch. I didn't
realize there wasn't a test for SProt before in the regression tests,
so my previous test didn't handle OX lines correctly on older files
(ie. it bombs out if there isn't an OX line. I think the new one does
it right). Sorry about that, I think this might have been
the problem Andrew was talking about in his Martel tests.

I think this is it, and then nothing will use Sequence.py. Pretty
exciting! What do people think? Ready for Sequence.py to go so we only 
have one sequence class?

Additionally, have we also thought about getting rid of the SeqIO
directory? I think the current Fasta.py will do everything this does
right now, so we might not need it any more. What do people think?

Brad

-------------- next part --------------
*** SProt.py.orig	Wed Nov 29 19:37:27 2000
--- SProt.py	Sat Jan 27 12:35:14 2001
***************
*** 20,30 ****
  Dictionary         Accesses a SwissProt file using a dictionary interface.
  ExPASyDictionary   Accesses SwissProt records from ExPASy.
  RecordParser       Parses a SwissProt record into a Record object.
! SequenceParser     Parses a SwissProt record into a Sequence object.

  _Scanner           Scans SwissProt-formatted data.
  _RecordConsumer    Consumes SwissProt data to a Record object.
! _SequenceConsumer  Consumes SwissProt data to a Sequence object.

  Functions:
--- 20,30 ----
  Dictionary         Accesses a SwissProt file using a dictionary interface.
  ExPASyDictionary   Accesses SwissProt records from ExPASy.
  RecordParser       Parses a SwissProt record into a Record object.
! SequenceParser     Parses a SwissProt record into a Seq object.

  _Scanner           Scans SwissProt-formatted data.
  _RecordConsumer    Consumes SwissProt data to a Record object.
! _SequenceConsumer  Consumes SwissProt data to a Seq object.

  Functions:
***************
*** 36,42 ****
  import string
  from Bio import File
  from Bio import Index
! from Bio import Sequence
  from Bio.ParserSupport import *
  from Bio.WWW import ExPASy
  from Bio.WWW import RequestLimiter
--- 36,44 ----
  import string
  from Bio import File
  from Bio import Index
! from Bio import Alphabet
! from Bio import Seq
! from Bio import SeqRecord
  from Bio.ParserSupport import *
  from Bio.WWW import ExPASy
  from Bio.WWW import RequestLimiter
***************
*** 288,299 ****
          return self._consumer.data

  class SequenceParser:
!     """Parses SwissProt data into a Sequence object.

      """
!     def __init__(self):
          self._scanner = _Scanner()
!         self._consumer = _SequenceConsumer()

      def parse(self, handle):
          self._scanner.feed(handle, self._consumer)
--- 290,307 ----
          return self._consumer.data

  class SequenceParser:
!     """Parses SwissProt data into a Seq object.

      """
!     def __init__(self, alphabet = Alphabet.generic_protein):
!         """Initialize a RecordParser.
! 
!         Arguments:
!         o alphabet - The alphabet to use for the generated Seq objects. If
!         not supplied this will default to the generic protein alphabet.
!         """
          self._scanner = _Scanner()
!         self._consumer = _SequenceConsumer(alphabet)

      def parse(self, handle):
          self._scanner.feed(handle, self._consumer)
***************
*** 390,396 ****

      def _scan_ox(self, uhandle, consumer):
          self._scan_line('OX', uhandle, consumer.taxonomy_id,
!                         one_or_more=1)

      def _scan_reference(self, uhandle, consumer):
          while 1:
--- 398,404 ----

      def _scan_ox(self, uhandle, consumer):
          self._scan_line('OX', uhandle, consumer.taxonomy_id,
!                         any_number=1)

      def _scan_reference(self, uhandle, consumer):
          while 1:
***************
*** 712,728 ****
              setattr(ref, m, string.rstrip(attr))

  class _SequenceConsumer(AbstractConsumer):
!     """Consumer that converts a SwissProt record to a Sequence object.

      Members:
!     data    Record with SwissProt data.

      """
!     def __init__(self):
          self.data = None

      def start_record(self):
!         self.data = Sequence.NamedSequence(Sequence.Sequence())

      def end_record(self):
          pass
--- 720,746 ----
              setattr(ref, m, string.rstrip(attr))

  class _SequenceConsumer(AbstractConsumer):
!     """Consumer that converts a SwissProt record to a Seq object.

      Members:
!     data      Record with SwissProt data.
!     alphabet  The alphabet the generated Seq objects will have.

      """
!     def __init__(self, alphabet = Alphabet.generic_protein):
!         """Initialize a Sequence Consumer
! 
!         Arguments:
!         o alphabet - The alphabet to use for the generated Seq objects. If
!         not supplied this will default to the generic protein alphabet.
!         """
          self.data = None
+         self.alphabet = alphabet

      def start_record(self):
!         seq = Seq.Seq("", self.alphabet)
!         self.data = SeqRecord.SeqRecord(seq)
!         self.data.description = ""

      def end_record(self):
          pass
***************
*** 730,738 ****
      def identification(self, line):
          cols = string.split(line)
          self.data.name = cols[1]

      def sequence_data(self, line):
!         seq = string.rstrip(string.replace(line, " ", ""))
          self.data.seq = self.data.seq + seq

  def index_file(filename, indexname, rec2key=None):
--- 748,765 ----
      def identification(self, line):
          cols = string.split(line)
          self.data.name = cols[1]
+ 
+     def accession(self, line):
+         ids = string.split(string.rstrip(line[5:]), ';')
+         self.data.id = ids[0]
+ 
+     def description(self, line):
+         self.data.description = self.data.description + \
+                                 string.strip(line[5:]) + "\n"

      def sequence_data(self, line):
!         seq = Seq.Seq(string.rstrip(string.replace(line, " ", "")),
!                       self.alphabet)
          self.data.seq = self.data.seq + seq

  def index_file(filename, indexname, rec2key=None):