[Bioperl-l] Genbank to gff3 conversion problem

Smithies, Russell Russell.Smithies at agresearch.co.nz
Thu Jan 29 21:55:30 UTC 2009


Do you still have problems if you try to parse the complete record rather than that pesky version with contig joins?
Eg http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=37558541&view=gbwithparts


--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Mungall
> Sent: Friday, 30 January 2009 10:27 a.m.
> To: shafeeq rim
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Genbank to gff3 conversion problem
> 
> 
> This looks like a problem with the Unflattener code (which I wrote a
> while ago) - as ChrisF states, this is sometimes required to
> reconstruct the hierarchical subfeature relations from the 'flat'
> genbank model
> 
> Here is  NT_011512:
> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&val=37558541
> 
> It's human chromosome 21
> 
> This genbank record that is split over multiple sub-records, using
> join directives to stitch it all back together. I once new how to
> handle such beasts in bioperl, but I think I've forgotten. In any
> case, the unflattener does not make any attempt to download and join
> together the sequence fragments, it just gets confused by this record
> and throws a wobbly.
> 
> A bioperl guru may be able to help with the correct strategy here, but
> roughly speaking I can see two alternate approaches:
> 
> 1. reconstruct the entire chromosome record and feed that to the
> unflattener
> 2. iterate though each record running the gb2gff conversion on each,
> then stitch the gff records together
> 
> 2 will be quite tricky because of the coordinate transformation. On
> the other hand, there may be memory issues.
> 
> I'm sure this is doable as I used to regularly load all of genbank
> human into a bioSQL database using a predecessor of the unflattener;
> I'm just not sure what the best strategy is these days. I have a
> feeling the human assemblies are (or were) available on a one file per
> chromosome basis from the NCBI ftp site. This may be your best bet.
> 
> On Jan 28, 2009, at 8:23 PM, shafeeq rim wrote:
> 
> > Hi,
> >
> > I was trying my hands on  bp_genbank2gff3.pl script
> > in Bioperl but getting lots of errors. I first tried with .gbk file
> in
> > genbank and then .gbs file but still no success. I actually want
> Human
> > genome annotation file in gff3 format for my application. Is there
> > any other
> > software or script for gff3 conversion from genbank format apart from
> > this bioperl script? Readseq utility is there but it converts only
> > into
> > gff2.
> > I extracted and removed problematic file but still of no use
> > NT_011512.
> >
> > I am getting errors like this :-
> > I am using this command perl bp_genbank2gff3.pl ref_chr21.gbk
> > ##########################################################
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/
> > share/perl/5.8.8/Bio/SeqFeature/Tools/Unflattener.pm:1549
> > STACK (eval) bp_genbank2gff3.pl:378
> > STACK main::unflatten_seq bp_genbank2gff3.pl:377
> > STACK toplevel bp_genbank2gff3.pl:229
> >
> > --------------------------------------
> >
> > Possible gene unflattening error withNT_011512: consult STDERR
> >
> > ------------- EXCEPTION  -------------
> > MSG: seq_id must be set
> > STACK
> Bio::SeqFeature::Tools::IDHandler::generate_unique_persistent_id
> > /usr/local/share/perl/5.8.8/Bio/SeqFeature/Tools/IDHandler.pm:242
> > STACK main::generic_features bp_genbank2gff3.pl:353
> > STACK toplevel bp_genbank2gff3.pl:254
> > #####################################################
> >
> > Thanks in advance
> > Regards
> > Shafeeq
> >
> >
> >      Add more friends to your messenger and enjoy! Go to
> http://messenger.yahoo.com/invite/
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list