[Bioperl-l] Extract contigs

Antony03 antony.vincent.1 at ulaval.ca
Sun Apr 27 17:04:49 UTC 2014


Hi,

I wrote this little code:
#!/usr/bin/perl 
     #By Antony Vincent#
 


use strict;
use warnings;
use diagnostics;
 
use Bio::Perl;
use Bio::SeqIO;
use IO::String;
use Bio::SearchIO;
use Getopt::Long;
 
my $filename;
my $help;

 
GetOptions(
    'file=s' => \$filename,
    'help!' => \$help,
) or die "Incorrect usage! Try perl new_db.pl -help for an exhaustif
help.\n";

if( $help ) {
    print "	**********\n";
    print "	***HELP***\n";
    print "	**********\n\n";
    print "One option is required:\n\n";
    print "	-file: Your file in multi-fasta\n\n";
    exit;
}
my @taxa_name;
open(FILE, "<", "test");

while(<FILE>) {
  chomp;
  push(@taxa_name, $_);

}

close(FILE);

print @taxa_name;

mkdir 'new_db';

 
my $gb = Bio::SeqIO->new(-file => "<$filename",
                              -format => "fasta");
 
my $fa = Bio::SeqIO->new(-file => ">new_db/$filename",
                              -format => "fasta",
                              -flush => 0);

SEQ:
while (my $seq = $gb->next_seq) {
    my $id_and_desc = $seq->id . " " . $seq->desc;
    foreach my $str (@taxa_name) {
        if ($id_and_desc =~ /\Q$str\E/) {
            $fa->write_seq($seq);
            next SEQ;
        }
    }
}

It allows to extract contigs from a multi-fasta file. The problem is when I
try to extract the contig-1, it extracts contig-1, contig-10, contig-11 ....

How can I change my code for extract only contigs with exact names.

Thanks



--
View this message in context: http://bioperl.996286.n3.nabble.com/Extract-contigs-tp17469.html
Sent from the Bioperl-L mailing list archive at Nabble.com.



More information about the Bioperl-l mailing list