[Bioperl-l] Parsing phred/phrap outputs

nkuipers nkuipers@uvic.ca
Tue, 12 Nov 2002 10:31:41 -0800


Hi Alberto,

I'm a bioperl newby too so I hope I don't end up embarassing myself with this 
response.  The .contigs file from PHRAP (at least on my system, as called by
>phrap -ace somefile) is already in fasta format.  So it sounds like all you 
need is a script that removes the XXX... substrings?  If I don't understand 
your question properly, stop reading now. :)

You could use Bio::SeqIO for this, although there are bioperl phred/phrap 
modules (which I haven't used yet).  The following untested script should do 
it:

use Bio::SeqIO;

my $in = Bio::SeqIO->new( -file => "<./file.contigs" -format => 'Fasta' );
my $out = Bio::SeqIO->new( -file => ">>trimmed.contigs" -format => 'Fasta' );
while ( my $seq = $in->next_seq() ) {
$seq->seq =~ s/[X]//g;

#Not sure if the object can be inlined as above in the substitution.
#If not, you could make a temp var that does the substitution and then set
#$seq->seq to the temp.  Blech!

$out->write_seq( $seq );
}

Cheers,

Nathanael Kuipers
---
Center for Biomedical Research,
Dept. of Biology,
University of Victoria

>===== Original Message From "Alberto M. R. Davila" 
<davila@gene.dbbm.fiocruz.br> =====
>Dear All,
>
>I am new to BioPerl so please be patience with me... :-)
>
>Running phredPhrap I got the seqs containing vectors marked with "XXXX" (in
>the "file_name.contigs" and "file-Name.fasta.screen" output files) .... I
>has been unable to find the complete seq of the pMOS cloning vector on
>Internet (even at the Amersham site), then I wonder to know any could have
>a) any script to parse such "XXXX" and get the seqs in fasta format and/or
>b) the complete seq of the pMOS cloning vector.
>
>Thanks in advance for any help you may provide.
>
>Kind regards,
>
>Alberto
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l