[Bioperl-l] Assembly::IO write_assembly and remove_seq

Janet Young jayoung at fhcrc.org
Mon Aug 20 22:09:04 UTC 2007


Hi all,

I realized last week that write_assembly isn't implemented in  
Assemble::IO
(see http://bioperl.org/pipermail/bioperl-l/2006-May/021619.html )
I know this has been asked before, but I wondered if anything has  
changed - does anyone have any plans to write a write_assembly  
method? Alternatively, any suggestions for an alternative solution to  
what I'm trying to do?

I'm trying to write a script to make improvements to the assembly  
that phredPhrap comes out with - it seems to quite frequently throw  
an unrelated sequence into a contig with either no matching sequence  
at all, or very little matching sequence. Mysterious. Anyway, my  
script can recognize the bad sequences easily enough, and thought I'd  
be able to remove them and then write the modified assembly. No joy.  
One very inelegant solution I've played with is that I can add some  
"markedHighQuality" tags to the discrepant sequences in the ace file,  
meaning that next time phredPhrap is run, it sometimes manages not to  
assemble the sequences that shouldn't be there. I'm not sure this  
will work in all cases, and it seems like quite an unsatisfactory way  
to do it.

For the same reason, I'm hoping someone can tell me what remove_seq  
does to a contig object? I'm using it and I don't get any error  
messages (returns 1), but when I check the contig object afterwards  
with get_seq_ids, the sequence I wanted to remove didn't seem to go  
away. Also, when I check out the primary_tags for that contig in the  
objects returned by get_features_collection, nothing seems to have  
changed. So I'm not sure whether the sequence really was removed from  
anything at all, and if it was, which object did it get removed  
from?  (a snippet of my code is below)
           my @seqids  = $contig->get_seq_ids();
           print OUT "seqids @seqids\n";
           my $seqobj = $contig->get_seq_by_name($seq);
           $contig->remove_seq($seqobj) || die "failed to remove seq\n";
           @seqids  = $contig->get_seq_ids();
           print OUT "seqids @seqids\n";

thanks for any advice,

Janet Young


-------------------------------------------------------------------

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung at fhcrc.org

http://www.fhcrc.org/labs/trask/

-------------------------------------------------------------------




More information about the Bioperl-l mailing list