[Biopython] Converting from NCBIXML to SearchIO
Martin Mokrejs
mmokrejs at fold.natur.cuni.cz
Thu Feb 13 20:38:34 UTC 2014
Hi,
I am in the process of conversion to the new XML parsing code written by Bow.
So far, I have deciphered the following replacement strings (somewhat written in sed(1) format):
/hsp.identities/hsp.ident_num/
/hsp.score/hsp.bitscore/
/hsp.expect/hsp.evalue/
/hsp.bits/hsp.bitscore/
/hsp.gaps/hsp.gap_num/
/hsp.bits/hsp.bitscore_raw/
/hsp.positives/hsp.pos_num/
/hsp.sbjct_start/hsp.hit_start/
/hsp.sbjct_end/hsp.hit_end/
# hsp.query_start # no change from NCBIXML
# hsp.query_end # no change from NCBIXML
/record.query.split()[0]/record.id/
/alignment.hit_def.split(' ')[0]/alignment.hit_id/
/record.alignments/record.hits/
/hsp.align_length/hsp.aln_span/ # I hope these do the same as with NCBIXML (don't remember whether the counts include minus signs of the alignment or not)
Now I am uncertain. There used to be hsp.sbjct_length and alignment.length. I think the former length was including the minus sign for gaps while the latter is just the real length of the query sequence.
Nevertheless, what did alignment.length transform into? Into len(hsp.query_all)? I don't think hsp.query_span but who knows. ;)
Meanwhile I see my biopython-1.62 doesn't understand hsp.gap_num, looks that has been added to SearchIO in 1.63. so, that's all from me now until I upgrade. ;)
Thank you,
Martin
More information about the Biopython
mailing list