[Biopython-dev] [Biopython - Bug #3387] Generic per column annotation from stockholm alignment are not stored in alignment object

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Thu Oct 18 11:02:49 UTC 2012


Issue #3387 has been updated by saverio vicario.

File diff_StockholmIO.py added
File StockholmIO.py added

This is my proposal of patch for StockholmIO.
Attached you will find the new StockholmIO.py and a diff file with the old one. 
To highlight further the new comments I start the comment by #SV 

In summary the patch implement the new attribute _letter_annotations for Bio.Align.MultipleSeqAlignment and store the GC features within, in the iterator while in the writer write the GC features after all sequence record as stated in http://sonnhammer.sbc.su.se/Stockholm.html.

I added a new dictionary for GC and GF features using PFAM standard and it is used in the writing phase to write only PFAM legitimate attributes. The only addition to PFAM standard is the GC features "RF" that is add by HMMer3.0 softwares to indicates what sites where originally present in the profile used to generate the alignment. 

I do not use the dictionary of PFAM standard to translate the GF, GR attributes of alignment._annotations or the GC attributes in alignment._letter_annotations as is done in the seqRecord for consistency with decision taken originally with GR attributes in alignment._annotations

----------------------------------------
Bug #3387: Generic per column annotation from stockholm alignment are not stored in alignment object
https://redmine.open-bio.org/issues/3387

Author: saverio vicario
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Stockholm format includes 4 types of annotations
#=GF <feature> <Generic per-File annotation, free text>
#=GC <feature> <Generic per-Column annotation, exactly 1 char per column>
#=GS <seqname> <feature> <Generic per-Sequence annotation, free text>
#=GR <seqname> <feature> <Generic per-Sequence AND per-Column markup, exactly 1 char per column>
GC and GF annotation are not pickup by AlignIO and not supported in Bio.Align.MultipleSeqAlignment because no annotation is available at alignment level. In fact Bio.Align.MultipleSeqAlignment.annotations or Bio.Align.MultipleSeqAlignment.letter_annotations do not exist, only Bio.Align.MultipleSeqAlignment._annotations that is generated from the single records annotations and letter_annotations.

GC annotation in stockholm contain the quality score of the sites (columns of the alignment) that is a quite important parameters to decide if to trim the sites or not. 




-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list