[BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)?

Peter Cock p.j.a.cock at googlemail.com
Mon Jul 25 08:57:51 EDT 2011


On Mon, Jul 25, 2011 at 1:03 PM, Roy Chaudhuri <roy.chaudhuri at gmail.com> wrote:
> I don't think there's any specific handling, but (in GenBank files at least)
> mol_type is recorded as a tag in the source feature, so it will be stored in
> BioSQL like any other feature tag (in seqfeature_qualifier_value).

I'd forgotten in my question this potential slight redundancy in the
GenBank format!

Consider this example, the molecule type is only in the LOCUS
line (DNA), and incidentally there are two source features:

http://biopython.org/SRC/biopython/Tests/GenBank/NT_019265.gb

Likewise in the current version of the sample record on the NCBI
website, the molecule type is only in the LOCUS line (in this case
again just as DNA, but other values are mentioned), and not in the
source feature:

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.htm

However in this third example, the molecule type is in the LOCUS
line (as DNA) and in the source feature (as genomic DNA):

http://biopython.org/SRC/biopython/Tests/GenBank/NC_000932.gb

The GenBank/EMBL feature annotation is quite straightforward
with mapping to BioSQL (and I'm pretty sure the Biopython and
BioPerl are consistent here). Its all the header information that
isn't as pinned down.

Let me clarify that I'm interested in if and where BioPerl stores the
molecule type from the GenBank LOCUS line in BioSQL (and I'm
expecting this to go in bioentry_qualifier_value table under some
tag name).

Thanks again,

Peter

P.S.

As as been discussed before, the BioSQL documentation would
benefit from at least one worked example of a (small) GenBank
file showing where each field ends up in the database. It would be
a reasonable amount of work though - but could then be used for
a basic compliance unit test by all the Bio* interfaces to BioSQL.


More information about the BioSQL-l mailing list