[Biopython-dev] incomplete patch to parse Genbank-TPA record BK000008

Andreas Kuntzagk andreas.kuntzagk at mdc-berlin.de
Mon Sep 15 07:06:39 EDT 2003


Since I'm stuck with this, I'm giving you the incomplete patch in hope,
somebody will have time to look at it. At the moment it parses the
BK000008 and BK000018 entries in GenBank and hopefully others as well.

BUT: the information is put into the Record only in form of one long
string with newlines deleted. If anybody really neads this info, it
should be stored in a better way.




-------------- next part --------------
Index: Bio/GenBank/Record.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/GenBank/Record.py,v
retrieving revision 1.8
diff -r1.8 Record.py
9,11d8
< # standard modules
< import string
< 
175a173
>         self.primary=[]
Index: Bio/GenBank/__init__.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v
retrieving revision 1.42
diff -r1.42 __init__.py
988c988
< 
---
>         
1068c1068
< 
---
>         
1074c1074
< 
---
>         
1077a1078,1084
>     def primary_ref_line(self,content):
>         """Data for the PRIMARY line"""
>         self.data.primary.append(content)
> 
>     def primary(self,content):
>         pass
>     
1217c1224
<                               "sequence", "contig_location", "record_end"]
---
>                               "sequence", "contig_location", "record_end","primary_ref_line"]
Index: Bio/GenBank/genbank_format.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/GenBank/genbank_format.py,v
retrieving revision 1.28
diff -r1.28 genbank_format.py
22,24c22
< # standard library
< import string
<      
---
> 
303a302,313
> # PRIMARY
> primary_line = Martel.Group("primary_line",
>                             Martel.Str("PRIMARY") +
>                             blank_space +
>                             Martel.Str("TPA_SPAN") +
>                             blank_space +
>                             Martel.Str("PRIMARY_IDENTIFIER") +
>                             blank_space +
>                             Martel.Str("PRIMARY_SPAN") +
>                             blank_space +
>                             Martel.Str("COMP") + 
>                             Martel.ToEol())
304a315,327
> primary_ref_line =Martel.Group("primary_ref_line",
>                                blank_space +
>                                Martel.Re(r"\d+\-\d+") +
>                                blank_space +
>                                Martel.Re("[\S]+") +
>                                blank_space +
>                                Martel.Re("\d+\-\d+")+
>                                Martel.Opt(blank_space +  Martel.Str("c"))+
>                                Martel.ToEol())
>                               
> primary =  Martel.Group("primary",primary_line +
>                         Martel.Rep1(primary_ref_line))
>                                        
735a759
>                       Martel.Opt(primary) +\


More information about the Biopython-dev mailing list