[Bioperl-l] getting DNA sequence for exon features from GFF

kanmani radha kanmaninradha at gmail.com
Thu Aug 26 12:22:14 EDT 2010


Hi Everyone,

Thanks very much for this clarification.  Thanks a ton for every one who
spared their time to educate me.

I see your points.  Please correct me if I am wrong.

I understand that, Its better to use use Bio::DB::SeqFeature or Bio::DB::GFF
to load the fasta sequences (from a separate multifasta) file and
then Bio::Tools::GFF to parse the feature info from a gff file . Then query
the created database for the relevent GFF coordinates....

I will implement this.

Thanks once again.
Kanmani

On Thu, Aug 26, 2010 at 7:20 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Kammani,
>
> If you are using BioPerl, the best option currently available is to load a
> database with all relevant information (GFF and FASTA), then use that
> database for querying.  The most commonly-used ones now are
> Bio::DB::SeqFeature::Store and Bio::DB::GFF; the former is very
> GFF3-centric, but I believe it can handle GFF/GTF, and it has various
> database adaptors (MySQL, Pg, BDB, SQLite).
>
> chris
>
> On Aug 26, 2010, at 4:19 AM, Frank Schwach wrote:
>
> > Hi Kammani,
> >
> > While GFF files may contain DNA sequence data, most of them don't, so
> > you will have to use the location information you get from the GFF
> > annotation file in conjunction with, e.g., a local FASTA database of the
> > genomic sequence you are working with or an online resource.
> >
> >
> > Frank
> >
> >
> >
> > On Thu, 2010-08-26 at 01:29 -0700, kanmani radha wrote:
> >> Hi All,
> >> I would like to get the DNA seq from GFF file. I'm using Bio::Tools::GFF
> >> module. I could get everything else but not the DNA seq.
> >>
> >> Can anyone help me to find this out, Please. I appreciate your help very
> >> much.
> >> thanks,
> >> Kanmani
> >>
> >> #!/usr/bin/perl
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::Tools::GFF;
> >>
> >> my $file = shift;
> >>
> >> my $gffio = Bio::Tools::GFF->new(-file => $file, -gff_version => 3);
> >> $gffio->features_attached_to_seqs(1);
> >>
> >> while (my $feat = $gffio->next_feature()){
> >>    my $start = $feat->start;
> >>    my $end= $feat->end;
> >>    my $size = $end-$start+1;
> >>    my $strand = $feat->strand;
> >>    my $seqid = $feat->seq_id;
> >>    my $score = $feat->score;
> >>    my $frame = $feat->frame;
> >>    my $source = $feat->source_tag;
> >>    my $type = $feat->primary_tag;
> >>    my $gffstr = $gffio->gff_string($feat);
> >>    my @alltags = $feat->all_tags();
> >>    my @ID_tag_value = $feat->each_tag_value("ID");
> >>
> >>    my  $seq = $feat->seq();
> >>    print "$seq\n";
> >>
> >>     if($type eq "gene"){     #
> >>       print "@ID_tag_value\t$size\t$type\t$start\t$end\n";
> >>    }
> >> }
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > --
> > The Wellcome Trust Sanger Institute is operated by Genome Research
> > Limited, a charity registered in England with number 1021457 and a
> > company registered in England with number 2742969, whose registered
> > office is 215 Euston Road, London, NW1 2BE.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


More information about the Bioperl-l mailing list