[Biopython] User-defined SeqRecord annotations are trashed in INSDC formats?
Uri Laserson
laserson at mit.edu
Tue Mar 22 15:44:02 UTC 2011
>
> As far as the current Biopython output goes, you can basically use any
> (short) string as a qualifier key.
>
Sorry, I meant for the values, not the keys. Can you have a list of strings
as a value?
> Using a source feature is really just a work around for the fact that
> GenBank/EMBL do not support arbitrary record level annotation.
> Do you have to use this as your output format?
Agreed. Essentially, I have a huge pile of sequencing reads that are highly
annotated. For any given read, there are some annotations that are
independent of the sequence itself (which is what I am trying to implement
now) and there are some annotations that are associated with subsequences
(which is why SeqFeatures are very appropriate). Ideally, I want a file
format that will store the data, be easily parsable (and fast), and can be
readable using something like `less` (though this last feature is less
important).
> Would you not be
> better off with using a database or something else instead?
>
Well, initially I used XML to store the data, but I quickly realized I was
reinventing the wheel, especially when it came to annotating features on top
of the sequences.
Are you suggesting something like SQLite? How would I deal with
SeqFeature-type annotations?
Uri
> Peter
>
More information about the Biopython
mailing list