[Bioperl-l] how to set first line of genbank file

Jason Stajich jason at cgt.duhs.duke.edu
Mon Nov 3 16:00:54 EST 2003


You'll want to read up on Bio::Seq::RichSeq as that is where these 'extra'
fields come from for GenBank/EMBL writing

LOCUS name part comes from $seq->display_id
Length is based on the length of the sequence.
$seq->molecule defines what goes in as 'DNA'
$seq->is_circular() is how 'linear' gets set.
$seq->division() sets the division (PLN)
$seq->get_dates() gets the dates (the 1st one in the list is what is put
                  here).  See Bio::Seq::RichSeq for add_dates() method
                  to add your own.  It is a little bit of work to remove
                  a date, ask if you need to do this.

Barring this sort of information (above) you may have to read
Bio::SeqIO::genbank

I did add a table to the documentation which attempts to map these fields
into the Bioperl data structures for you.


[from Bio::SeqIO::genbank]
Items listed as Annotation 'NAME' tell you the data is stored the
associated Bio::Annotation::Colection object which is associated with
Bio::Seq objects.  If it is explictly requested that no annotations
should be stored when parsing a record of course they won't be
available when you try and get them.  If you are having this problem
look at the type of SeqBuilder that is being used to contruct your
sequence object.

Comments             Annotation 'comment'
References           Annotation 'reference'
Segment              Annotation 'segment'
Origin               Annotation 'origin'

Accessions           PrimarySeq accession_number()
Secondary accessions RichSeq get_secondary_accessions()
Keywords             RichSeq keywords()
Dates                RichSeq get_dates()
Molecule             RichSeq molecule()
Seq Version          RichSeq seq_version()
PID                  RichSeq pid()
Division             RichSeq division()
Features             Seq get_SeqFeatures()
Alphabet             PrimarySeq alphabet()
Definition           PrimarySeq description() or desc()
Version              PrimarySeq version()

Sequence             PrimarySeq seq()


-jason

On Mon, 3 Nov 2003, Magic Fang wrote:

> the standard first line of genbank file is:
> LOCUS       OSA277468              17385 bp    DNA     linear   PLN 23-OCT-2003
> how to set the it when use bioperl to create genbank file.
> thank u.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list