[Bioperl-l] Question about parsing a gb file
Torsten Seemann
torsten.seemann at infotech.monash.edu.au
Mon Mar 30 00:25:48 UTC 2009
> Hi everybody,I have a little problem/question in parsing a genbank file.
> I've got a $s = Bio::Seq object to which I've added
> some Bio::SeqFeature::Generic, everything here seem to be ok since I can
> find all the properties of the $s setted correctly in my visual debugger;
> for instance, I can find the display_name properties of the SeqFeature in
> the $s object.
> Than I perform a print Bio::SeqIO->new(-format => 'genbank')->write_seq($s)
> to write down the genbank file but there I can't get any more some
> properties of the sequence, like the "display_name".
> What does it happens?
> my $s = $str->next_seq();
> my $f = Bio::SeqFeature::Generic->new(
> -start => 10,
> -end => 100,
> -strand => -1,
> -primary => 'CDS', # -primary_tag is a synonym
> -source_tag => 'repeatmasker',
> -display_name => 'alu family'
> );
> $s->add_SeqFeature($f);
> print Bio::SeqIO->new(-format => 'genbank')->write_seq($s)
The logical conclusion is that the 'genbank' output format does not
store the -display_name attribute of a SeqFeature. If you look at the
output of your script you will see only this:
CDS complement(10..100)
You will have to add appropriate -tags => { name=>value, .... } to
your SeqFeature from the Genbank/EMBL feature table
http://www.ncbi.nlm.nih.gov/collab/FT/
In particular I think you want to do the following:
my $f = Bio::SeqFeature::Generic->new(
-start => 10, -end => 100,
-strand => -1,
-primary => 'CDS', # -primary_tag is a synonym
-tags = {
product => 'alu family',
note => 'repeatmasker',
locus_tag => 'GENE00432', # etc
}
);
Hope this helps,
--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
University, AUSTRALIA
More information about the Bioperl-l
mailing list