[Bioperl-l] fasta2gff_landmark ...

Wed May 13 13:17:48 EDT 2009

Hi Dan,

I've already done this; take a look at:

http://gmod.cvs.sourceforge.net/viewvc/*checkout*/gmod/schema/chado/bin/gmod_fasta2gff3.pl

While it is part of the GMOD/Chado distribution, the only modules it
uses is Bio::DB::Fasta and GetOpt::Long, and it has some nice
convenience functions for modifying the resulting GFF.

Scott

On Wed, May 13, 2009 at 1:01 PM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi,
>
> I am trying to create a script to write one 'GFF landmark' line for
> each entry in a fasta file. I'm doing this so that the landmarks show
> up in GBrowse (along with my attached similarity 'features').
>
> In a nutshell, I was told to add lines like:
>
> C08HBa0216M19.1    .    contig    1    <length>    .    .    .
> ID=C08HBa0216M19.1;Name=C08HBa0216M19.1
>
>
> to the GFF so that GBrowse would pick up the features which have
> C08HBa0216M19.1 as their 'landmark'. This seems to be working great
> for the sequences where I added this line manually, but now I want to
> write a script to automatically generate these GFF landmarks from a
> fasta file.
>
>
> So far I have a script that looks like this:
>
> <source lang="perl">
> #!/usr/bin/perl -w
>
> use strict;
>
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> use Bio::FeatureIO;
>
> [snip]
>
> my $gffIO =
>    new Bio::FeatureIO( -format => 'GFF',
>                        -version => 3
>                        );
>
> ## Now read in the fasta file to create the landmarks
>
> my $seqIO =
>  new Bio::SeqIO( -file => $finished_bacs_repeatmasked,
>                  -format => 'fasta'
>                );
>
> ## Begwin
>
> while(my $seq = $seqIO->next_seq){
>  ## Create a new generic feature to hold the properties of the
>  ## sequence;
>
>  my $name = $seq->id;
>  my $leng = $seq->length;
>
>  my $accn;
>
>  next unless $accn = $finished{$name};
>
>  my $f =
>    new Bio::SeqFeature::Generic( -seq_id  => $name,
>                                  -primary => "SGN",
>                                  -source  => "genomic_clone",
>                                  -start   =>     1,
>                                  -end     => $leng,
>                                  -tag     => { ID   => $name,
>                                                Name => $name,
>                                                Dbxref => "GB:$accn",
>                                                Ontology_term => "SO:0000040"
>                                              }
>                                );
>
>  print $gffIO->write_feature($f);
>  exit;
> }
> </source>
>
>
> The script gives me the following error:
>
> Use of uninitialized value in pattern match (m//) at
> ~/perl5/lib/perl5/Bio/FeatureIO/gff.pm line 107, <FB> line 688.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: only Bio::SeqFeature::Annotated objects are writeable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw ~/perl5/lib/perl5/Bio/Root/Root.pm:368
> STACK: Bio::FeatureIO::gff::write_feature
> ~/perl5/lib/perl5/Bio/FeatureIO/gff.pm:272
> STACK: ./fasta2gff_landmarks.plx:72
> -----------------------------------------------------------
>
>
> With all due respect to:
>
> * http://www.bioperl.org/wiki/GFF_Refactor and
> * http://www.bioperl.org/wiki/GFF_code_audit
>
>
> I'm a bit stuck.
>
> Any quick fix in the meantime? I realise that using a feature object
> is overkill here, but I wanted to learn the 'BioPerl way' of doing
> things. Should I drop that approach until the code audit and
> refactorization is over, or is this the kind of use-case that can help
> with that process?
>
>
> Thanks for any pointers,
> Dan.
>
>
> P.S. Thanks also to the friendly people on
> irc://ircfreenode.net/#bioperl and
> irc://irc.freenode.net/#bioinformatics who helped me get this script
> together as it is.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research