[Bioperl-l] Creating FASTA library.

Fri, 30 Aug 2002 10:10:26 +0100 (BST)

On Fri, 30 Aug 2002, Ewan Birney wrote:

> > generating a single file from a directory of peptides suitable to input into
> > clustalw?  I have only been able to find a third party application that runs
> > on windows to do it (I am running the bioperl package in OSX).
> 
> Easy unix way: (will work on MacOS X), assumming all the files are called
> .pep and they are all fasta files:
> 
> cat *.pep > file_for_clustalw

If you have thousands of files and if the MacOS X shells have the same
limitations of the 'normal' unix shells (you get a "Arg list too long" or
similar message), you can use this workaround:

find . -name "*.pep" -exec cat \{\} >> file_for_clustalw \;

(Most Unix shells try to expand on wildcards before doing anything, and
with lots of hits you reach the limits of the argument buffer. 'find'
doesn't expand first, but checks each file if it fulfills the wildcard,
and then excutes the command given after the -exec parameter. All the \
stuff is required if your shell thinks {, }, and ; are specials)

If you need more flexibility or more processing you need a script as Ewan
pointed out.

With kind regards,

Jean-Jack Riethoven

EMBL Outstation - Hinxton           pow@ebi.ac.uk     ICQ#: 3433929
European Bioinformatics Institute   Phone: (+44) 1223 494635      
Wellcome Trust Genome Campus        Fax  : (+44) 1223 494468
Hinxton, Cambridge CB10 1SD         URL  : http://industry.ebi.ac.uk/
UNITED KINGDOM