[BioPython] Sequence numbering. Moving on...

Iddo Friedberg idoerg@cc.huji.ac.il
Wed, 6 Oct 1999 11:46:04 +0200 (GMT+0200)


On Wed, 6 Oct 1999 Thomas.Sicheritz@molbio.uu.se wrote:

: Andrew Dalke writes:
:  > I do see your point, and I need to consider it some more.The use
:  > case you have,
:  >
:  > > MySlice(7,30).seqReverse().translate()
:  >
:  > is slightly problematical, since Python sequences return None from
:  > a call to reverse().
: 
: Again - for practical reasons, why not an additional return_seqReverse method ?
:  MySlice(7,30).return_seqReverse().translate()
: 

seqReverse(), unlike reverse() returns a Seq object, but does not change
the original. The implementation:

def seqReverse(self):
   retSeq = copy.deepcopy(self)
   retSeq.reverse():
   return retSeq

The use of deepcopy() here is bad. I know. (1) Memory/time inefficient.
(2) We may not want to keep all the other attributes' values (such as
annotation). The idea is returning a new instantiation. A
hidden method to copy just the sequence attribute is perhaps in order.

: 
: If I am able to understand your parser, I can try to add rules for
: SWISSPROT.

Oh, that'll be great. Actually, there isn't really a parser there, just
something that assumes a GenBank format, and even then it probably makes a
mess of things. The idea is to first read the file, seperate the sequence
from the annotation (perhaps using Python's Multifile), then parse the
annotation in initAnnot. Something along those lines anyway.


 : My current project parses ~1000 SWISSPROT entries each day and
: I have never thought about event based parsing - but it seems to have some
: great advantages over my document-orientated pseudoparsers ... :-)

I actually like the idea of dynamically adding attributes according to
whatever the parsed text throws at me. Pretty useful with the variability
of available documentation, even in the same format. This stems from the
variability of the information sequences carry, especially in the
nucleotide sequences: some have several ORFs, others have none; some have
promoter sites, RNA stem-and-loops, and other unique features. Since we
cannot possibly hope to cover them all, some form of event-based parsing
should be used for reading the documentation.

Iddo

--

/* --- */main(c){float t,x,y,b=-2,a=b;for(;b-=a>2?.1/(a=-2):0,b<2;
/*  |  */putchar(30+c),a+=.0503) for(x=y=c=0;++c<90&x*x+y*y<4;y=2*
/*  |  */x*y+b,x=t)t=x*x-y*y+a;}
/* --- ddo Friedberg */