[Bioperl-l] getting DNA sequence for exon features from GFF
Chris Fields
cjfields at illinois.edu
Thu Aug 26 10:20:48 EDT 2010
Kammani,
If you are using BioPerl, the best option currently available is to load a database with all relevant information (GFF and FASTA), then use that database for querying. The most commonly-used ones now are Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very GFF3-centric, but I believe it can handle GFF/GTF, and it has various database adaptors (MySQL, Pg, BDB, SQLite).
chris
On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:
> Hi Kammani,
>
> While GFF files may contain DNA sequence data, most of them don't, so
> you will have to use the location information you get from the GFF
> annotation file in conjunction with, e.g., a local FASTA database of the
> genomic sequence you are working with or an online resource.
>
>
> Frank
>
>
>
> On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
>> Hi All,
>> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
>> module. I could get everything else but not the DNA seq.
>>
>> Can anyone help me to find this out, Please. I appreciate your help very
>> much.
>> thanks,
>> Kanmani
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use Bio::Tools::GFF;
>>
>> my $file = shift;
>>
>> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
>> $gffio->features_attached_to_seqs(1);
>>
>> while (my $feat = $gffio->next_feature()){
>> my $start = $feat->start;
>> my $end= $feat->end;
>> my $size = $end-$start+1;
>> my $strand = $feat->strand;
>> my $seqid = $feat->seq_id;
>> my $score = $feat->score;
>> my $frame = $feat->frame;
>> my $source = $feat->source_tag;
>> my $type = $feat->primary_tag;
>> my $gffstr = $gffio->gff_string($feat);
>> my @alltags = $feat->all_tags();
>> my @ID_tag_value = $feat->each_tag_value("ID");
>>
>> my $seq = $feat->seq();
>> print "$seq\n";
>>
>> if($type eq "gene"){ #
>> print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
>> }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list