[Biopython-dev] User-defined annotations in Stockholm alignment file

João Rodrigues j.p.g.l.m.rodrigues at gmail.com
Tue Apr 5 04:21:37 UTC 2016


Thanks Peter, but I'm not sure these issues relate to what I am looking for.

I went a bit through the parser and the part that I actually need is to
read/write custom keys in GS records. Specifically, I am parsing Stockholm
files produced by HMMER (and looking to add some extra info to the
resulting Alignment obj), which I just realized are not properly formatted
because they contain multiple GS annotations in one line (see below).

# STOCKHOLM 1.0
> #=GF ID sp|P00929|TRPA_SALTY-i5
> #=GS sp|P00929|TRPA_SALTY          DE Tryptophan synthase alpha chain
> OS=Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) GN=trpA
> PE=1 SV=1
>
​
>
> SeqRecord(seq=Seq('MERYENLFAQLNDR-REG-AFVPFVTLG-D--PGIEQSLKIIDTLIDAGADALE...SRA',
> SingleLetterAlphabet()), id='sp|P00929|TRPA_SALTY',
> name='sp|P00929|TRPA_SALTY', description='Tryptophan synthase alpha chain
> OS=Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) GN=trpA
> PE=1 SV=1', dbxrefs=[])


I am genuinely surprised that HMMER outputs this weird format. Would it be
an option to verify if such formatting exists (regex?) in a GS line and if
so break it accordingly, or is this a remote edge case and the added
overhead is just too much?

My original question was why does AlignIO ignore "custom" annotations it
doesn't know, while writing (StockholmIO, line 254
<https://github.com/biopython/biopython/blob/master/Bio/AlignIO/StockholmIO.py#L254>
)?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20160404/371a8de7/attachment.html>


More information about the Biopython-dev mailing list