[BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)?
Roy Chaudhuri
roy.chaudhuri at gmail.com
Mon Jul 25 14:12:38 UTC 2011
>> I don't think there's any specific handling, but (in GenBank files
>> at least) mol_type is recorded as a tag in the source feature, so
>> it will be stored in BioSQL like any other feature tag (in
>> seqfeature_qualifier_value).
>
> I'd forgotten in my question this potential slight redundancy in the
> GenBank format!
No problem, I forgot in my answer that for some obscure reason people
may be interested in looking at GenBank files that aren't bacterial
genome sequences.
> Let me clarify that I'm interested in if and where BioPerl stores
> the molecule type from the GenBank LOCUS line in BioSQL (and I'm
> expecting this to go in bioentry_qualifier_value table under some tag
> name).
As far as I can tell, the only fields stored by default in
bioentry_qualifier_value are keyword, date_changed and
secondary_accession (although my database only contains GenBank
bacterial genomes). As with the is_circular hack, you could store the
molecule type by adding it as an annotation in the SequenceProcessor
(it's stored as $seq->molecule by BioPerl).
Actually, when round-tripping a GenBank file through BioSQL, the LOCUS
line molecule type ends up in lower case, which makes me wonder if it is
coming from alphabet in the biosequence table.
> P.S.
>
> As as been discussed before, the BioSQL documentation would benefit
> from at least one worked example of a (small) GenBank file showing
> where each field ends up in the database. It would be a reasonable
> amount of work though - but could then be used for a basic compliance
> unit test by all the Bio* interfaces to BioSQL.
I agree that this would be very useful - the SearchIO HOWTO has a
similar treatment of a BLAST report that I often refer to.
More information about the BioSQL-l
mailing list