[Bioperl-l] comparing fasta sequences in multiple files

Sun Mar 14 12:38:15 UTC 2010

On Sun, Mar 14, 2010 at 2:12 AM, robby jhones <robby.hones at gmail.com> wrote:
> I think that I'll need to write a hash of the IDs and sequences, then
> iterate over the sequences to see if they are identical and if so push them
> and the ID into an output file. I was hoping there was something out there
> like this, but I suppose not.

Look in the mailing list archives for the last week or so.  There was
some discussion about generating hashes of sequences; you could use
that to generate your hash of unique sequences.

Sean

> On Sat, Mar 13, 2010 at 4:49 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>
>> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones <robby.hones at gmail.com>
>> wrote:
>> > Dear Group,
>> >
>> >  Can anyone offer advice on comparing multiple fasta sequences in many
>> > files. We have 1000's of fasta sequences in individual files of which I
>> > would like to fish out and print to a new file (the sequence and ID),
>> > ONLY
>> > the sequences which appear in at least a few of the files: 3 out of 4
>> > runs,
>> > perhaps all 4 runs ( as some are replicates).
>> >
>> >  Is there something out there which would do this?
>>
>> Hi, Robby.
>>
>> It sounds like making a hash of IDs and then incrementing a count for
>> each as you loop over files would give you what you want?
>>
>> Sean
>
>