Yuval, if this is all you're doing, it may not be worth using Bioperl.
Here's a Perl one-liner that does it: cut and paste he whole thing onto the
command line and replace the filenames (id_list, input.fasta, and
found.fasta at the end) with your own.

perl -e '($id,$fasta)=@ARGV; open(ID,$id); while (<ID>) {s/\r?\n//;
/^>?(\S+)/; $ids{$1}++;} $num_ids = keys %ids; open(F, $fasta); $s_read =
$s_wrote = $print_it = 0; while (<F>) { if (/^>(\S+)/) {$s_read++; if
($ids{$1}) {$s_wrote++; $print_it = 1; delete $ids{$1}} else {$print_it =
0}}; if ($print_it) {print $_}}; END {warn "Searched $s_read FASTA
records.\nFound $s_wrote IDs out of $num_ids in the ID list.\n"}' id_list
input.fasta > found.fasta 

http://www.cgr.harvard.edu/cbg/scriptome/UNIX/Tools/Choose.html#choose_a_set
_of_fasta_sequences_from_a_file__choose_fastas_from_list_

(There's a teensy-bit different version for Windows Perl too.)

You can email me off-list for more info.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626

> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann@infotech.monash.edu.au] 
> Sent: Wednesday, April 05, 2006 6:14 PM
> To: Yuval Itan
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Getting sequences by ID
> 
> On Wed, 2006-04-05 at 18:03 +0100, Yuval Itan wrote:
> > I would be grateful for an advice from you regarding 
> Bioperl, after I was 
> > fiddling around trying to write the Perl script for that 
> from scratch.
> > I have a large fasta file of about 20,000 genes, and 
> another file which is a 
> > list of about 2,000 gene IDs (no sequences), all included 
> in the large file. 
> > I need to create a fasta file which will include only the 
> genes with these 
> > specific 200 IDs. I was wondering if there is a method in 
> Bioperl that will 
> > allow me to do the following pseudocode:
> > 
> > For each $ID from 200_IDs_set_file
> > {
> > 	$my_seq = get_sequence_by_ID(from large_fasta_file, $ID)
> > 	write $my_seq into file
> > }
> 
> There are many possibilities involving combinations of pure Perl and
> BioPerl modules, and some even involving no Perl, but rather using
> commands like 'formatdb' and 'fastacmd -s'. There are probably EMBOSS
> solutions too.
> 
> Using your pseudo code, you could use Bio::Index::Fasta to index your
> 20,000 genes. Then loop over each ID, and retrieve the Seq via the
> index, and write it out using Bio::SeqIO.
> 
> Perhaps look at it from another perspective:
> 
> # put all the IDs we want into a hash (read from file)
> my %want_id = .... ; 
> 
> foreach $seq (use Seq::IO to read large_fasta_file) {
>   if $want_id{$seq->id} then
>     use Seq::IO to write this $seq out
>   end
> }
> 
> 
> 
> -- 
> Torsten Seemann <torsten.seemann@infotech.monash.edu.au>
> Victorian Bioinformatics Consortium
> 
> 
> 

