[Bioperl-l] match patterns to retrieve seqs

Jason Stajich jason.stajich at gmail.com
Mon Mar 19 16:40:11 UTC 2012


And take a look at the script in the bioperl distribution

index/bp_seqret.pl

which will do exactly this.

Jason
On Mar 18, 2012, at 9:49 AM, Philipp Schiffer wrote:

> Hi!
> 
> I am trying to write a small script to retrieve such sequences (including headers) from a fasta file that match the headers I listed in another file.
> The fasta file looks like:
> >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191
> MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR.............
> >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108
> MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............
> >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5
> MLHHR.......
> 
> while the list file is:
> maker-scaffold539_size40162-snap-gene-0.4-mRNA-1
> snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1
> maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1
> maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1
> 
> My perl so far would be:
> 
> #! /usr/bin/perl -w
> 
> use Bio::SeqIO;
> 
> my $usage = "list_from_fasta.pl fastafile format headerlistfile\n";
> my $file   = shift or die $usage;
> my $format = shift or die $usage;
> my $headerlist = shift or die $usage;
> 
> open LIST, "<$headerlist" or die $!;
> my $queries = <LIST>; #might be a hash as well, don't know what's better
> 
> my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format);
> while (my $fastaseq = $fastafile->next_seq){
> 
>    if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much trying it seems actually to look for matches, but does not find any, still there must be
> 
>    {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";}
>  ;
> };
> 
> So I am wondering, is this the right way to tackle the problem. Will it work? Should I use another bioPerl module? My thinking is, that the pattern matching operation after the 'if' statement is wrong.
> 
> Any help highly appreciated!
> 
> Thanks a lot.
> 
> Philipp
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org





More information about the Bioperl-l mailing list