[Bioperl-l] comparing fasta sequences in multiple files

Sean Davis sdavis2 at mail.nih.gov
Sat Mar 13 19:49:46 EST 2010


On Sat, Mar 13, 2010 at 6:57 PM, robby jhones <robby.hones at gmail.com> wrote:
> Dear Group,
>
>  Can anyone offer advice on comparing multiple fasta sequences in many
> files. We have 1000's of fasta sequences in individual files of which I
> would like to fish out and print to a new file (the sequence and ID), ONLY
> the sequences which appear in at least a few of the files: 3 out of 4 runs,
> perhaps all 4 runs ( as some are replicates).
>
>  Is there something out there which would do this?

Hi, Robby.

It sounds like making a hash of IDs and then incrementing a count for
each as you loop over files would give you what you want?

Sean



More information about the Bioperl-l mailing list