[Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/930 (fwd)

gert thijs gert.thijs@esat.kuleuven.ac.be
Wed, 28 Mar 2001 18:02:59 +0200


I have been trying to solve this problem for a few days now and I think I
finally have found the real cause.
I have written a script that first downloads a genbank entry and than writes
this entry to a flat file using Bio::SeqIO. If I compare the original entry
and the one written in the file there is a important difference. Here is an
example of the sequence with accession umber AP000413, where I have selected
the description of the second CDS.

In the original sequence the location is given as
complement(join(6390..6420,...)), while in the written entry the same location
is represented as: complement(join(complement(6390..6420),...))

Parsing the original sequence is no problem, but when reading the sequence
from flat file an warning message is printed:
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of
complement(join(complement(60316..60427), ignoring feature (seqid=AP000413)
---------------------------------------------------


Original entry:
--
     CDS             complement(join(6390..6420,6562..6717,6805..6929,
                     7023..7155,7233..7363,7518..7682,7771..7863,8124..8391,
                     8467..8595,8778..8857,8947..9015,9102..9218,9350..9418,
                     9888..9941,10054..10091,10184..10295,10447..10569,
                     10651..10678,10938..10994,11149..11243))
                     /note="gb|AAF34831.1
                     gene_id:K7L4.2"
                     /codon_start=1
                     /evidence=not_experimental
                     /product="MAP kinase"
                     /protein_id="BAB02151.1"
                     /db_xref="GI:9294249"
                    
/translation="MDDVAGLQEAAGARFSQIELIGRGSFGDVYKAFDKDLNKEVAIK
                    
VIDLEESEDEIEDIQKEISVLSQCRCPYITEYYGSYLHQTKLWIIMEYMAGGSVADLL
                    
QSNNPLDETSIACITRDLLHAVEYLHNEGKIHRDIKAANILLSENGDVKVADFGVSAQ
                    
LTRTISRRKTFVGTPFWMAPEVIQNSEGYNEKADIWSLGITVIEMAKGEPPLADLHPM
                    
RVLFIIPRETPPQLDEHFSRQVKEFVSLCLKKAPAERPSAKELIKHRFIKNARKSPKL
                    
LERIRERPKYQVKEDEETPRNGAKAPVESSGTVRIARDERSQGAPGYSFQGNTVKNAG
                    
WDFTVGGSQSIGTVRALKPPQARERRQEVSPNRISQRTTRPSGNQWSSATGSTISEAS
                    
EGGFVRRHPFQNDHEDGFHEEDDSSLSGSGTVVIRTPRSSQSSSVFREPSSGSSGRYA
                    
AFDDASASGTVVVRGQYDDSGSPRTPKSRLGIQERTSSASEDSNANLAEAKAALDAGF
                    
RRGKARERLGMGNNNNDGKVNRRREQMADDSDYSRNSGDKSSKQKVVPRSEQVSDEED
                    
DSIWESLPASLSVLLIPSLKEALGDDSKESTVRTVSRSLVMMEREKPGSCEAFVAKLI
                    
ELLGSSKEASVKELHDMAVCVFAKTTPDNAENKMKQANKEFSSNTNVSPLGRFLLSRW
                     LGQSSRDL"
--


The same entry as written in the flatfile:
--
     CDS             complement(join(complement(6390..6420),6562..6717,
                     6805..6929,7023..7155,7233..7363,7518..7682,7771..7863,
                     8124..8391,8467..8595,8778..8857,8947..9015,9102..9218,
                     9350..9418,9888..9941,10054..10091,10184..10295,
                     10447..10569,10651..10678,10938..10994,11149..11243))
                     /product="MAP kinase"
                     /protein_id="BAB02151.1"
                     /codon_start=1
                    
/translation="MDDVAGLQEAAGARFSQIELIGRGSFGDVYKAFDKDLNKEVAIKV
                    
IDLEESEDEIEDIQKEISVLSQCRCPYITEYYGSYLHQTKLWIIMEYMAGGSVADLLQS
                    
NNPLDETSIACITRDLLHAVEYLHNEGKIHRDIKAANILLSENGDVKVADFGVSAQLTR
                    
TISRRKTFVGTPFWMAPEVIQNSEGYNEKADIWSLGITVIEMAKGEPPLADLHPMRVLF
                    
IIPRETPPQLDEHFSRQVKEFVSLCLKKAPAERPSAKELIKHRFIKNARKSPKLLERIR
                    
ERPKYQVKEDEETPRNGAKAPVESSGTVRIARDERSQGAPGYSFQGNTVKNAGWDFTVG
                    
GSQSIGTVRALKPPQARERRQEVSPNRISQRTTRPSGNQWSSATGSTISEASEGGFVRR
                    
HPFQNDHEDGFHEEDDSSLSGSGTVVIRTPRSSQSSSVFREPSSGSSGRYAAFDDASAS
                    
GTVVVRGQYDDSGSPRTPKSRLGIQERTSSASEDSNANLAEAKAALDAGFRRGKARERL
                    
GMGNNNNDGKVNRRREQMADDSDYSRNSGDKSSKQKVVPRSEQVSDEEDDSIWESLPAS
                    
LSVLLIPSLKEALGDDSKESTVRTVSRSLVMMEREKPGSCEAFVAKLIELLGSSKEASV
                     KELHDMAVCVFAKTTPDNAENKMKQANKEFSSNTNVSPLGRFLLSRWLGQSSRDL"
                     /db_xref="GI:9294249"
                     /evidence="not_experimental"
                     /note="gb|AAF34831.1gene_id:K7L4.2"
--



Jason Stajich wrote:
> 
> Gert, Petre :
> 
> (Petre I *think* this is the same type of bug you were getting)
> 
> I fixed this bug on the 07 branch and main trunk - added your suggested
> protection to test that the seqfeature was actually still existant (this
> was a bad code decision, I think the rationale was to die when an
> unparseable location line was read, but that is not very smart IMHO).  I
> don't know if you are working off CVS or just downloading bioperl
> packages.  If you are working off CVS, anonymous cvs will have these
> changes in about 3 hours.  We can generate patches for you if you want to
> patch your local distribution, but CVS will be easier for you in the long
> run as we continue to shake out the bugs (assuming firewalls are not an
> issue for you).
> 
> Please let me know if these changes do not fix your situation.
> -Jason
> 
>  On Wed, 21 Mar 2001, gert thijs wrote:
> 
> > Jason Stajich wrote:
> > >
> > > Please send me the accession number of the offending genbank file.
> > >
> >
> > The accession number was AP000413
> >
> >
> > Gert Thijs
> >
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center
> http://www.chg.duke.edu/

-- 
+ Gert Thijs              
+ 
+ email: gert.thijs@esat.kuleuven.ac.be 
+ homepage: http://www.esat.kuleuven.ac.be/~thijs
+ 
+ K.U.Leuven
+ ESAT-SISTA 
+ Kasteelpark Arenberg 10 
+ B-3001 Leuven-Heverlee  
+ Belgium  
+ Tel : +32 16 32 18 84
+ Fax : +32 16 32 19 70