[Bioperl-l] getting DNA sequence for exon features from GFF

Thu Aug 26 10:20:48 EDT 2010

Kammani,

If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying.  The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite).

chris

On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:

> Hi Kammani,
> 
> While GFF files may contain DNA sequence data, most of them don't, so
> you will have to use the location information you get from the GFF
> annotation file in conjunction with, e.g., a local FASTA database of the
> genomic sequence you are working with or an online resource.
> 
> 
> Frank
> 
> 
> 
> On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
>> Hi All,
>> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
>> module. I could get everything else but not the DNA seq.
>> 
>> Can anyone help me to find this out, Please. I appreciate your help very
>> much.
>> thanks,
>> Kanmani
>> 
>> #!/usr/bin/perl
>> 
>> use strict;
>> use warnings;
>> use Bio::Tools::GFF;
>> 
>> my $file = shift;
>> 
>> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
>> $gffio->features_attached_to_seqs(1);
>> 
>> while (my $feat = $gffio->next_feature()){
>>    my $start = $feat->start;
>>    my $end= $feat->end;
>>    my $size = $end-$start+1;
>>    my $strand = $feat->strand;
>>    my $seqid = $feat->seq_id;
>>    my $score = $feat->score;
>>    my $frame = $feat->frame;
>>    my $source = $feat->source_tag;
>>    my $type = $feat->primary_tag;
>>    my $gffstr = $gffio->gff_string($feat);
>>    my @alltags = $feat->all_tags();
>>    my @ID_tag_value = $feat->each_tag_value("ID");
>> 
>>    my  $seq = $feat->seq();
>>    print "$seq\n";
>> 
>>     if($type eq "gene"){     #
>>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>>    }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l