[EMBOSS] getorf includes unspecified amino acids as part of the ORF sequence

Peter Rice pmr at ebi.ac.uk
Wed Jan 13 12:36:08 EST 2010


Hi Avi,

> I made a mistake and took a repeat masked contigs instead of the
> original contigs, and they indeed had Ns. Sorry for the mess (still,
> I am looking for an option where Ns are not be included in the ORF).

Just too late for the next EMBOSS release (in preparation), but a good 
suggestion for July.

We should look at adding options to all the translation programs for 
repeat-masked inputs. This probably means treating each unmasked (non N) 
region as a separate sequence with options to include an OREF running up 
to the Ns or to stop at the last stop codon, and the same for the start 
of an ORF. Similar to handling the start and end of the whole sequence.

Hope that will help

Peter


More information about the EMBOSS mailing list