[Bioperl-l] match patterns to retrieve seqs

EvoPHS philipp.schiffer at gmail.com
Mon Mar 19 19:30:20 UTC 2012


Thanks for help everybody

On Monday, March 19, 2012 3:36:37 PM UTC+1, Frank Schwach wrote:
>
> as you correctly guessed, your regex
> /(@queries)/
> isn't doing what you were hoping it would do, i.e. match against all 
> elements of the array. You could compile a regex with all the 
> alternative patterns from your array but I think that's not very 
> efficient, so you could try something like this instead:
>
> my $does_match = 0;
> $fastaseq ->id =~ /$_/ && $does_match = 1 for @queries;
> if ($does_match){
> ...
> }
>
> Frank
>
>
> On 18/03/12 16:49, Philipp Schiffer wrote:
> > Hi!
> >
> > I am trying to write a small script to retrieve such sequences
> > (including headers) from a fasta file that match the headers I listed in
> > another file.
> > The fasta file looks like:
> >  >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein
> > AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191
> > MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR.............
> >  >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein
> > AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108
> > MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............
> >  >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein
> > AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5
> > MLHHR.......
> >
> > while the list file is:
> > maker-scaffold539_size40162-snap-gene-0.4-mRNA-1
> > snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1
> > maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1
> > maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1
> >
> > My perl so far would be:
> >
> > #! /usr/bin/perl -w
> >
> > use Bio::SeqIO;
> >
> > my $usage = "list_from_fasta.pl fastafile format headerlistfile\n";
> > my $file = shift or die $usage;
> > my $format = shift or die $usage;
> > my $headerlist = shift or die $usage;
> >
> > open LIST, "<$headerlist" or die $!;
> > my $queries = <LIST>; #might be a hash as well, don't know what's better
> >
> > my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format);
> > while (my $fastaseq = $fastafile->next_seq){
> >
> > if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much
> > trying it seems actually to look for matches, but does not find any,
> > still there must be
> >
> > {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";}
> > ;
> > };
> >
> > So I am wondering, is this the right way to tackle the problem. Will it
> > work? Should I use another bioPerl module? My thinking is, that the
> > pattern matching operation after the 'if' statement is wrong.
> >
> > Any help highly appreciated!
> >
> > Thanks a lot.
> >
> > Philipp
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
>  The Wellcome Trust Sanger Institute is operated by Genome Research 
>  Limited, a charity registered in England with number 1021457 and a 
>  company registered in England with number 2742969, whose registered 
>  office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>



More information about the Bioperl-l mailing list