[BioSQL-l] How is is_circular recorded in BioSQL (by BioPerl)?

Roy Chaudhuri roy.chaudhuri at gmail.com
Mon Jul 25 14:12:38 UTC 2011


>> I don't think there's any specific handling, but (in GenBank files
>> at least) mol_type is recorded as a tag in the source feature, so
>> it will be stored in BioSQL like any other feature tag (in
>> seqfeature_qualifier_value).
>
> I'd forgotten in my question this potential slight redundancy in the
>  GenBank format!

No problem, I forgot in my answer that for some obscure reason people 
may be interested in looking at GenBank files that aren't bacterial 
genome sequences.

> Let me clarify that I'm interested in if and where BioPerl stores
> the molecule type from the GenBank LOCUS line in BioSQL (and I'm
> expecting this to go in bioentry_qualifier_value table under some tag
> name).

As far as I can tell, the only fields stored by default in 
bioentry_qualifier_value are keyword, date_changed and 
secondary_accession (although my database only contains GenBank 
bacterial genomes). As with the is_circular hack, you could store the 
molecule type by adding it as an annotation in the SequenceProcessor 
(it's stored as $seq->molecule by BioPerl).

Actually, when round-tripping a GenBank file through BioSQL, the LOCUS 
line molecule type ends up in lower case, which makes me wonder if it is 
coming from alphabet in the biosequence table.

> P.S.
>
> As as been discussed before, the BioSQL documentation would benefit
> from at least one worked example of a (small) GenBank file showing
> where each field ends up in the database. It would be a reasonable
> amount of work though - but could then be used for a basic compliance
> unit test by all the Bio* interfaces to BioSQL.

I agree that this would be very useful - the SearchIO HOWTO has a 
similar treatment of a BLAST report that I often refer to.



More information about the BioSQL-l mailing list