[EMBOSS] getorf includes unspecified amino acids as part of the ORF sequence

Peter Rice pmr at ebi.ac.uk
Tue Jan 12 09:15:28 EST 2010


Hi Avi,

> The input is a simple fasta file with only A,C,T,G letters and
> nothing else, so I wouldn't expect any Xs. In addition, even if there
> would be Ns (and there are no Ns) the program cannot know if such Ns
> do not include stopcodons so it should not consider them as part of an ORF.

>>> 00001_3 [803 - 1120]
>> LARLRFVVLGNSFIASAKGWSTPYGPTTFGPFRSCIYPRVFRSTRVRKAMATRIGSNRVN
>> ILIRCTXXXXXXXXXXXXXXXXXXXXXXXXXNPYLGWWCYIFCIFR

That suggests the Xs have all come from stop codons.

There are other possibilities, including a badly formatted input file
(perhaps two sequences and descriptions read as one).

We do need to see the input file to know where those Xs are from.

Peter Rice


More information about the EMBOSS mailing list