[Bioperl-l] Creating FASTA library.

Robert Citek rwcitek@alum.calberkeley.org
Fri, 30 Aug 2002 10:15:44 -0500


Hello Jean-Jack,

At 10:10 AM 8/30/2002 +0100, Jean-Jack Riethoven wrote:
>If you have thousands of files and if the MacOS X shells have the same
>limitations of the 'normal' unix shells (you get a "Arg list too long" or
>similar message), you can use this workaround:
>
>find . -name "*.pep" -exec cat \{\} >> file_for_clustalw \;

This way will fork 'cat' and open/close file_for_clustalw for every single
.pep file, which will probably be very slow.  If you do have thousands of
files, a modified method would be to include the xargs command:

find . -type f -name "*.pep"  | xargs cat > file_for_clustalw

I've included the "-type f" switch to ensure that you only get files and
not directories.  Also, the ">>" was changed to ">".  xargs knows how to
limit the length of the command line.  You can also force xargs to only
take a certain number of arguments with the --max-lines option and have it
behave nicely with names containing spaces with the -0 option:

find . -type f -name "*.pep" -print0 | 
  xargs -0 --max-lines=100 cat > file_for_clustalw

Regards,
- Robert