[Bioperl-l] R: Re: fasta file parser

Tue Jul 22 14:17:06 UTC 2008

guys, thanks for your ready replies
but now I'm a little confused...

the first 'while' was just to avoid duplicates in the list...
does your code create a cleared list? what should I write where you say "# do whatever else here..."

thanks

----Messaggio originale----

Da: cjfields at uiuc.edu

Data: 22/07/2008 15.00

A: "ste.ghi at libero.it"<ste.ghi at libero.it>

Cc: <bioperl-l at lists.open-bio.org>

Ogg: Re: [Bioperl-l] fasta file parser

Use exists() directly on the %seen lookup hash for your test.  Also, use chomp up front when creating %seen to get rid of newlines.
# assuming you have one ID per line...while (my $line = <LIST>) {
   chomp $line;   $seen{$line}++;   # do whatever else here...}
close LIST;
my $newSeqFileName = Bio::SeqIO->new(-file=> ">>INFILE", -format=>'fasta');
while (my $query = $SeqFileName->next_seq()) {    my $id = $query->id;
    if ( exists( $seen{$id} ) ) {        print "$id matched with $seen{$id} listed in $ARGV[1]: skipped!\n";
        next;
    }    elsif ( $elem ne $query->id ) {        if ( exists( $seen2{$id}++ ) ) {;            print "$id matched with $seen2{$id}, already found: skipped!\n";
            next;        }
        $newSeqFileName->write_seq($query);    }}
chris
On Jul 22, 2008, at 6:28 AM, ste.ghi at libero.it wrote:Dear all,
I'm trying to write a script wich, given a file containing a list of 
IDs, parses a big fasta file returning only sequences NOT listed in the list-
file.

To do so, I first create an array with the IDs to be excluded:

[...]

#Load LIST content in an array; avoids duplicates
while (my $line = <LIST>) {

    push(@array1,$line );    

    foreach my $uniq ( @array1 ){

	next if $seen
{ $uniq }++;

	push @unique, $uniq;

    }
}

then, process the fasta file in 
this way (NOT WORKING).

#Fasta file processing
my $newSeqFileName  = Bio::
SeqIO->new(-file=> ">>INFILE", -format=>'fasta');
while (my $query = 
$SeqFileName->next_seq()) {
       foreach my $elem(@unique){
		chomp $elem;

       	if ($elem eq $query->id) {  

            		print $query->id." matched 
with $elem listed in $ARGV[1]: skipped!\n";
                        next;
		} 
elsif ($elem ne $query->id) {
       			next if $seen2{ $query->id }++;

$newSeqFileName->write_seq($query);

            	}          

        }
}

...
in this way I get only an exact copy of the input file....where am I wrong?

Thanks a lot for your kind help!
Stefano
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

 Christopher FieldsPostdoctoral ResearcherLab of Dr. Marie-Claude HofmannCollege of Veterinary MedicineUniversity of Illinois Urbana-Champaign