[Bioperl-l] Parsing FASTA m10 output

Jason Stajich jason at bioperl.org
Sat Apr 21 17:44:00 UTC 2007


We don't have one yet. This is a new format introduced in the most  
recent release of FASTA.  Hopefully someone can make some time to add  
some code to SearchIO::fasta for it.

I do find that I when I need a fast FASTA to TAB converter that the  
simple script (fastam9_to_table) is more efficient that SearchIO  
framework so Ioannis is making a parallel one for the new m10  
output.  So I think having both is useful.

-jason
On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:

> I haven't kept track of this - did this go anywhere? Do we not have
> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> would be that that would be the desired solution; am I misled in  
> this?)
>
> 	-hilmar
>
> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>
>> I have reported it as a bug on the bugzilla but due to bugzilla
>> problems I
>> was not able to attach my code and/or sample m10 files.
>> Nevertheless here is the code that converts an m10 fasta output to
>> an m8
>> BLAST output which is parseable by the vast majority of software.
>>
>> <----------- CODE BEGINS HERE ------------------->
>>
>> #!/usr/bin/perl -w
>>
>> =head1 NAME
>>
>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>> output
>>
>> =head1 SYNOPSIS
>>
>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>
>> =head1 DESCRIPTION
>>
>> Command line options:
>>   --header                -- boolean flag to print column header
>>   -o/--out                -- optional outputfile to write data,
>>                              otherwise will write to STDOUT
>>   -h/--help               -- show this documentation
>>
>> Not technically a SearchIO script as this doesn't use any Bioperl
>> components but is a useful and fast.  The output is tabular output
>> with the standard NCBI -m8 columns.
>>
>>  queryname
>>  hit name
>>  percent identity
>>  alignment length
>>  number mismatches
>>  number gaps
>>  query start  (if on rev-strand start > end)
>>  query end
>>  hit start (if on rev-strand start > end)
>>  hit end
>>  evalue
>>  bit score
>>
>> Additionally 4 more columns are provided
>>  percent similar
>>  query length
>>  hit length
>>  query gaps
>>  hit gaps
>>
>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>
>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>
>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>
>> Headers as well as portions of code were taken
>>> from fastam9_to_table.pl by Jason Stajich
>>
>> =head1 DISCLAIMER
>>
>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>
>> Permission to use, copy, modify, merge, publish and distribute
>> this software and its documentation, with or without modification,
>> for any purpose, and without fee or royalty to the copyright holder 
>> (s)
>> is hereby granted with no restictions and/or prerequisites.
>>
>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND  
>> NONINFRINGEMENT.
>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>
>> =cut
>>
>> use strict;
>> use Getopt::Long;
>>
>> my %data=();
>>
>> my $outfile=''; my $header='';
>> GetOptions(
>>     'header'              => \$header,
>>     'o|out|outfile:s'     => \$outfile,
>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>        );
>>
>> my $outfh;
>> if( $outfile ) {
>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> } else {
>>     $outfh = \*STDOUT;
>> }
>>
>>
>> $/="\n>>>";
>>
>> my @fields = qw(qname hname percid alen mmcount gapcount
>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>> hgap);
>>
>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>> "\n" if
>> $header;
>>
>> while (<>) {
>>
>>         chomp;
>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>         my @hits = split(/\d+>>/, $_);
>>         @hits= split("\n>>", $hits[0]);
>>
>>         my $hit = shift @hits;
>>
>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> +)/));
>>
>>         foreach my $align (@hits) {
>>
>>             my @details= split ("\n>", $align);
>>            my $detail = shift @details;
>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>             $data{'bits'}=$1;
>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>             $data{'evalue'}=$1;
>>
>>             my $term = quotemeta("; sw_score");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'score'}=$1;
>>
>>             $term = quotemeta("; sw_ident:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percid'}=$1;
>>
>>             $term = quotemeta("; sw_sim:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percsim'}=$1;
>>
>>             $term = quotemeta("; sw_overlap:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'alen'}=$1;
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             my $lakis ='';
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>
>>             $data{'qgap'}=($1 =~ tr/\-//);
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; sq_len:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hlen'}=$1;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>             $data{'hgap'}=($1 =~ tr/-//);
>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>             $data{'mmcount'} = $data{'alen'} - ( int($data 
>> {'percid'} *
>> $data{'alen'}) + $data{'gapcount'});
>>
>> for ( $data{'percid'}, $data{'percsim'} ) {
>>     $_ = sprintf("%.2f",$_*100);
>> }
>>
>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>         }
>>
>> }
>>
>> <----------------- CODE ENDS HERE ---------------------->
>>
>> -- 
>>
>> *Ioannis Kirmitzoglou*, MSc
>> PhD. Student,
>> Bioinformatics Research Laboratory
>> Department of Biological Sciences
>> University of Cyprus
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/





More information about the Bioperl-l mailing list