[Bioperl-l] Mummer parser and data anlysis

Wed Oct 20 15:45:45 UTC 2010

On Oct 20, 2010, at 10:00 AM, shalabh sharma wrote:

> Hi All,
>        Is there any module for mummer in Bioperl?
> 
> Also i need some suggestions and ideas (i think this is the best place to
> ask).
> I am working with huge data (around 200 million illumina reads), earlier i
> was using blastx and other similar approaches to annotate but now i think
> thats not possible, i would be very grateful if anyone can give me some idea
> regarding this.
> 
> Thanks
> Shalabh

Hard to say unless we know a little more about what you are attempting to do.  Not sure why you are using mummer here, but...

This is something fairly well-covered in the literature for most use cases, and on places like seqanswers.  If you are doing something like aligning reads to reference genome(s) or set of gene models, you should be using something like bowtie/tophat, bwa, etc., with the output in SAM (BioPerl has perl wrappers for most of these modules).  

You can also do the same for metagenome analyses, but you may need to run BLAST and convert to SAM (maybe that's what you are doing?).  The samtools package comes with perl scripts to do that and can be further used to sort the matches, convert/index a BAM file for fast accession, etc.  From there you can then use tools like Bio::DB::SAM, R/BioConductor/RSamtools, or similar to access the sequences, find coverage statistics, run SNP calls, etc.

And, for the record, we do have an experimental mummer parser, but I believe it lies in a branch at the moment (don't think it has been merged yet):

http://github.com/bioperl/bioperl-live/tree/topic/bug-2701

chris