[Bioperl-l] fasta2gff_landmark ...
Scott Cain
scott at scottcain.net
Wed May 13 13:17:48 EDT 2009
Hi Dan,
I've already done this; take a look at:
http://gmod.cvs.sourceforge.net/viewvc/*checkout*/gmod/schema/chado/bin/gmod_fasta2gff3.pl
While it is part of the GMOD/Chado distribution, the only modules it
uses is Bio::DB::Fasta and GetOpt::Long, and it has some nice
convenience functions for modifying the resulting GFF.
Scott
On Wed, May 13, 2009 at 1:01 PM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi,
>
> I am trying to create a script to write one 'GFF landmark' line for
> each entry in a fasta file. I'm doing this so that the landmarks show
> up in GBrowse (along with my attached similarity 'features').
>
> In a nutshell, I was told to add lines like:
>
> C08HBa0216M19.1 . contig 1 <length> . . .
> ID=C08HBa0216M19.1;Name=C08HBa0216M19.1
>
>
> to the GFF so that GBrowse would pick up the features which have
> C08HBa0216M19.1 as their 'landmark'. This seems to be working great
> for the sequences where I added this line manually, but now I want to
> write a script to automatically generate these GFF landmarks from a
> fasta file.
>
>
> So far I have a script that looks like this:
>
> <source lang="perl">
> #!/usr/bin/perl -w
>
> use strict;
>
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> use Bio::FeatureIO;
>
> [snip]
>
> my $gffIO =
> new Bio::FeatureIO( -format => 'GFF',
> -version => 3
> );
>
> ## Now read in the fasta file to create the landmarks
>
> my $seqIO =
> new Bio::SeqIO( -file => $finished_bacs_repeatmasked,
> -format => 'fasta'
> );
>
> ## Begwin
>
> while(my $seq = $seqIO->next_seq){
> ## Create a new generic feature to hold the properties of the
> ## sequence;
>
> my $name = $seq->id;
> my $leng = $seq->length;
>
> my $accn;
>
> next unless $accn = $finished{$name};
>
> my $f =
> new Bio::SeqFeature::Generic( -seq_id => $name,
> -primary => "SGN",
> -source => "genomic_clone",
> -start => 1,
> -end => $leng,
> -tag => { ID => $name,
> Name => $name,
> Dbxref => "GB:$accn",
> Ontology_term => "SO:0000040"
> }
> );
>
> print $gffIO->write_feature($f);
> exit;
> }
> </source>
>
>
> The script gives me the following error:
>
> Use of uninitialized value in pattern match (m//) at
> ~/perl5/lib/perl5/Bio/FeatureIO/gff.pm line 107, <FB> line 688.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: only Bio::SeqFeature::Annotated objects are writeable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw ~/perl5/lib/perl5/Bio/Root/Root.pm:368
> STACK: Bio::FeatureIO::gff::write_feature
> ~/perl5/lib/perl5/Bio/FeatureIO/gff.pm:272
> STACK: ./fasta2gff_landmarks.plx:72
> -----------------------------------------------------------
>
>
> With all due respect to:
>
> * http://www.bioperl.org/wiki/GFF_Refactor and
> * http://www.bioperl.org/wiki/GFF_code_audit
>
>
> I'm a bit stuck.
>
> Any quick fix in the meantime? I realise that using a feature object
> is overkill here, but I wanted to learn the 'BioPerl way' of doing
> things. Should I drop that approach until the code audit and
> refactorization is over, or is this the kind of use-case that can help
> with that process?
>
>
> Thanks for any pointers,
> Dan.
>
>
> P.S. Thanks also to the friendly people on
> irc://ircfreenode.net/#bioperl and
> irc://irc.freenode.net/#bioinformatics who helped me get this script
> together as it is.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
More information about the Bioperl-l
mailing list