[Bioperl-guts-l] [Bug 2758] New: assemblyIO - can't read phredPhrap ace file with tagged repeats

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Feb 11 19:03:48 EST 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2758

           Summary: assemblyIO - can't read phredPhrap ace file with tagged
                    repeats
           Product: BioPerl
           Version: 1.6 branch
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core Components
        AssignedTo: bioperl-guts-l at bioperl.org
        ReportedBy: jayoung at fhcrc.org
                CC: jayoung at fhcrc.org


Hi there,

I have some phredPhrap ace files that cause assembly IO to die when I try to
read them.  I think I've tracked down why - they have some CT tags at the end
that tag repeat sequences (added by phredPhrap's tagRepeats.perl) - if I
manually edit the ace file to remove those CT tags, I can read the ace file
just fine.

Here's the error (Contig14 was the first one with the CT tag):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Cannot add feature to unknown contig 'Contig14'
STACK: Error::throw
STACK: Bio::Root::Root::throw
/home/jayoung/traskdata/perl/bioperl-live/Bio/Root/Root.pm:357
STACK: Bio::Assembly::IO::ace::next_assembly
/home/jayoung/traskdata/perl/bioperl-live/Bio/Assembly/IO/ace.pm:350
STACK: /home/jayoung/bin/check_phredphrap_assembly_very_simple.bioperl:20
-----------------------------------------------------------

I checked out the bioperl-live using svn today (revision 15528). I've updated
my phredPhrap fairly recently (version 080818) - I don't know if they've
changed their output format, or if something in Assembly::IO has changed.

Anyway, hopefully that'll be an easy fix for someone. I'll attach two ace
files:
another_phredPhrap_test_2.fasta.screen.ace.1 (this one gives the error above)
another_phredPhrap_test_2.fasta.screen.ace.1.edit  (this one reads in fine)

and here's my script: 
-----------------------------------------
#!/usr/bin/perl

use warnings;
use strict;
use Bio::Assembly::IO; 

my $ace = $ARGV[0];

my $aio = new Bio::Assembly::IO(-file=>"$ace",
                                -format=>'ace'); 
my $assembly = $aio->next_assembly; 
-----------------------------------------

This ace file also shows up something that I'm not sure whether to call a
feature or a bug: I also get a warning as follows:
--------------------- WARNING ---------------------
MSG: Adding non-nucleotidic sequence Contig25
---------------------------------------------------
Contig25 has a run of XXXXs in it - phredPhrap uses this to mask out matches to
vector sequence. I'm not sure whether it'll matter for any downstream analysis
that this object is assumed not to be nucleotide sequence, but it really is. 

thanks very much,

Janet

------------------------------------------------------------------- 

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168, 
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524

-------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list