[Bioperl-l] How do you create a genbank file?

Jason Stajich jason at cgt.duhs.duke.edu
Fri Oct 17 08:27:27 EDT 2003


If you are parsing Fasta files with properly set NCBI headers I thought we
had added a way to make this all get set properly, perhaps not...

You can set GI number with
$seq->primary_id($ginumber);

If you want to be creating RichSeq instead of Bio::Seq objects (in the
event you want to set some fields which are only available in RichSeq
objects, initialize your  Bio::SeqIO fasta parser like this:

use Bio::SeqIO;
use Bio::Seq::SeqFactory;
my $seqio = new Bio::SeqIO(-format => 'fasta',
			   -file   => $file,
			   -seqfactory => new Bio::Seq::SeqFactory
			( -type => 'Bio::Seq::RichSeq'));

(Or alternatively you can set the seqfactory after you have initialized
the SeqIO object with
$seqio->sequence_factory(new Bio::Seq::SeqFactory(-type =>
'Bio::Seq::RichSeq'));

-jason

On Fri, 17 Oct 2003, Marc Logghe wrote:

> > My question is how do I set the following?
> >
> > mRNA (instead of dna)
> > MAM (instead of UNK)
> > VERSION     AB050006.1  GI:26453358		<- I can't get
> > this line to appear
> > SOURCE      Bos taurus (cow)
> >    ORGANISM  Bos taurus
> >
> >
> To set the version you should use:
> $seq->seq_version($version);  # $version is e.g. 1
> Problem is, it is not possible to set the GI number. As far as I know, when you pass a genbank file, Bio::SeqIO does not even  parse it, at least it does not show up when you Data::Dump the resulting Bio::Seq::RichSeq object.
> There is no slot for that information, because it does not exist in e.g. an EMBL sequence record.
> Concerning the organism, first create the Bio::Species object. In case you only have the string 'Bos taurus' in your fasta, of course you can not generate the full classification. At least not using only your fasta data.
> my $species = Bio::Species->new([reverse split /\s/, $organism]);
> $seq->species($species);
>
> gives you:
> SOURCE      Bos taurus
>   ORGANISM  Bos taurus
>             Bos.
>
> HTH,
> Marc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list