[Bioperl-l] Re: *major* error in genbank parser or am i just insane?

Lin, Xiaoying J. Xiaoying.Lin@celera.com
Fri, 9 Aug 2002 11:38:08 -0700


Lincoln,

i agree that the code should not be do the guessing game for human
mistake like out of sync mRNA + CDS joins.  

but for CDS features but no exon features, I am not sure I understand
you correctly. there are lots submissions in Genbank, which only comes
with CDS (join) features, but no separate exon features. If that is a
mistake, it is a systematic mistake then. How does the current parser
handle a record like
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide
&list_uids=1458097&dopt=GenBank

I have not finished the older e-mails on this subject, so I may have
missed something here.  thought everyone was busy having fun at
Edmonton, when did you guys get time to flood everyone's e-mail box ;-).


BTW, enjoyed your and other's talks at the BOSC.

Thanks.

Xiaoying

> -----Original Message-----
> From: Lincoln Stein [mailto:lstein@cshl.org]
> Sent: Friday, August 09, 2002 1:28 PM
> To: brian.king@animorphics.net; Brian King; Ewan Birney
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Re: *major* error in genbank parser or am i
> just insane?
> 
> 
> Here's my 2c:
> 
> If the genbank entry has CDS features but no exons, or an 
> mRNA join operator 
> which is out of sync with the CDS join, then in my opinion 
> the quality of the 
> annotation is so questionable that BioSQL should throw up its 
> hands and seek 
> human assistance in interpretation.  Asking the import 
> software to read the 
> minds of the submitters is beyond what can be reasonably 
> expected, and only 
> ends up propagating errors.
> 
> Lincoln
> 
> On Friday 09 August 2002 04:49 am, Brian King wrote:
> > > This is very hard to do because you have to handle:
> > >
> > >
> > >    (a) CDS with no Exons
> > >
> > > and, my particular favourite
> > >
> > >    (b) a mRNA join operator which is out of sync
> > > with the CDS join
> > > operator (!)
> >
> > For (a) I'd put generic sub-features in the CDS to
> > hold the places of the presumed exons, and for (b) use
> > generic sub-features for the CDS and the mRNA joins
> > and just let them be out of sync.  I surrender on
> > remote joins!  I'd keep the location string in
> > documentation in the data, but not try to interpret
> > it.  Ideally the parser would download the remote
> > record, but...
> >
> > Regards,
> > Brian
> >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > HotJobs - Search Thousands of New Jobs
> > http://www.hotjobs.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> 
> -- 
> ==============================================================
> ==========
> Lincoln D. Stein                           Cold Spring Harbor 
> Laboratory
> lstein@cshl.org			                  Cold 
> Spring Harbor, NY
> ==============================================================
> ==========
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>