[Bioperl-l] Genbank to gff3 conversion problem
Chris Mungall
cjm at berkeleybop.org
Thu Jan 29 21:26:46 UTC 2009
This looks like a problem with the Unflattener code (which I wrote a
while ago) - as ChrisF states, this is sometimes required to
reconstruct the hierarchical subfeature relations from the 'flat'
genbank model
Here is NT_011512:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&val=37558541
It's human chromosome 21
This genbank record that is split over multiple sub-records, using
join directives to stitch it all back together. I once new how to
handle such beasts in bioperl, but I think I've forgotten. In any
case, the unflattener does not make any attempt to download and join
together the sequence fragments, it just gets confused by this record
and throws a wobbly.
A bioperl guru may be able to help with the correct strategy here, but
roughly speaking I can see two alternate approaches:
1. reconstruct the entire chromosome record and feed that to the
unflattener
2. iterate though each record running the gb2gff conversion on each,
then stitch the gff records together
2 will be quite tricky because of the coordinate transformation. On
the other hand, there may be memory issues.
I'm sure this is doable as I used to regularly load all of genbank
human into a bioSQL database using a predecessor of the unflattener;
I'm just not sure what the best strategy is these days. I have a
feeling the human assemblies are (or were) available on a one file per
chromosome basis from the NCBI ftp site. This may be your best bet.
On Jan 28, 2009, at 8:23 PM, shafeeq rim wrote:
> Hi,
>
> I was trying my hands on bp_genbank2gff3.pl script
> in Bioperl but getting lots of errors. I first tried with .gbk file in
> genbank and then .gbs file but still no success. I actually want Human
> genome annotation file in gff3 format for my application. Is there
> any other
> software or script for gff3 conversion from genbank format apart from
> this bioperl script? Readseq utility is there but it converts only
> into
> gff2.
> I extracted and removed problematic file but still of no use
> NT_011512.
>
> I am getting errors like this :-
> I am using this command perl bp_genbank2gff3.pl ref_chr21.gbk
> ##########################################################
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/
> share/perl/5.8.8/Bio/SeqFeature/Tools/Unflattener.pm:1549
> STACK (eval) bp_genbank2gff3.pl:378
> STACK main::unflatten_seq bp_genbank2gff3.pl:377
> STACK toplevel bp_genbank2gff3.pl:229
>
> --------------------------------------
>
> Possible gene unflattening error withNT_011512: consult STDERR
>
> ------------- EXCEPTION -------------
> MSG: seq_id must be set
> STACK Bio::SeqFeature::Tools::IDHandler::generate_unique_persistent_id
> /usr/local/share/perl/5.8.8/Bio/SeqFeature/Tools/IDHandler.pm:242
> STACK main::generic_features bp_genbank2gff3.pl:353
> STACK toplevel bp_genbank2gff3.pl:254
> #####################################################
>
> Thanks in advance
> Regards
> Shafeeq
>
>
> Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list