[EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior
Rozenbaum, Daniel (Biocceleration Inc)
daniel.rozenbaum at USPTO.GOV
Fri Oct 12 21:27:56 UTC 2012
Hello everyone,
We have encountered the following issue: if there's an erroneous (most likely unintentionally) entry in a list file that looks like "db:<space character>seqname", EMBOSS doesn't issue an error/warning message, but treats this entry as "db:*". Here's an example using the test database tsw in EMBOSS distribution; it contains 100 sequences, three of which match the pattern "hba*" :
% seqret tsw -auto -stdout | egrep "^>" | wc -l
100
% cat list2
tsw:hba*
% seqret list::list2 -auto -stdout | egrep "^>" | wc -l
3
% cat list1
tsw: hba*
% seqret list::list1 -auto -stdout | egrep "^>" | wc -l
100
Of course, the immediate answer is to instruct the users to be careful not to allow unintended spaces in the USA's. Might it be possible though to add some protection against potentially problematic consequences if such an error in the USA is made? In one such instance the resultant clustalw process ended up attempting to build a multiple alignment across the entire UniProt, which the server didn't handle well :-)
With best regards,
Daniel
--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314
More information about the EMBOSS
mailing list