[Bioperl-l] parse EMBL Feature Table only
Frank Schwach
fs5 at sanger.ac.uk
Mon Dec 14 12:18:17 UTC 2009
Hi,
Maybe I'm really missing something here but I can't find how to parse a
file that is basically just the Feature Table from an EMBL file, looking
like this:
FT CDS
join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842)
FT /colour=7
FT /product="RNA-binding protein, putative"
FT CDS 213199..214812
FT /colour=7
FT /product="eukaryotic translation initiation factor
3
FT subunit 7, putative"
...[more of the same]
So the file has no header and no actual sequence and it is used simply
to annotate a chromosome in a genome assembly. I've always used GFF for
that purpose but have been given this file now.
BioSeqIO->new(-format=>"EMBL") complains about the missing header and if
I stick in a fake ID line, it warns about the missing sequence and the
fact that the features don't fit on the sequence (of length 0).
Of course it's not difficult to write my own parser but I'm sure there
must be a BioPerl way of doing that that I have just overlooked. Thanks
for your help.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list