[Bioperl-l] fasta file parser

Tue Jul 22 12:42:31 UTC 2008

ste.ghi at libero.it wrote:
> Dear all,
> I'm trying to write a script wich, given a file containing a list of 
> IDs, parses a big fasta file returning only sequences NOT listed in the list-
> file.
> 
> To do so, I first create an array with the IDs to be excluded:
> 
> [...]
> 
> #Load LIST content in an array; avoids duplicates
> while (my $line = <LIST>) {
> 
> 
>     push(@array1,$line );    
> 
>     foreach my $uniq ( @array1 ){
> 
> 	next if $seen
> { $uniq }++;
> 
> 	push @unique, $uniq;
> 
>     }
> }

Not sure what you're doing here (probably the cause of your problem?). 
But hashes are your friend:

@list = <LIST>;
%unique = map { chomp($_) => 1 } @list;

> then, process the fasta file in 
> this way (NOT WORKING).
> 
> #Fasta file processing
> my $newSeqFileName  = Bio::
> SeqIO->new(-file=> ">>INFILE", -format=>'fasta');
> while (my $query = 
> $SeqFileName->next_seq()) {
     if (defined $unique{$query->id}) {
>             		print $query->id." matched 
> with $elem listed in $ARGV[1]: skipped!\n";
>                         next;
> 		} 
     else {
>        			next if $seen2{ $query->id }++;
> 			
> $newSeqFileName->write_seq($query);
> 
>             	}          
> 
>         }