[Bioperl-l] Mummer parser and data anlysis

Chris Fields cjfields at illinois.edu
Wed Oct 20 18:25:09 UTC 2010


I recall there being a lot of tools available for these.  In particular, one of my colleagues has used MEGAN with some success:

http://www-ab.informatik.uni-tuebingen.de/software/megan

If the sample is from a specific host (i.e. gut microbiome, etc), you can set up initial short read runs that act to filter out sequences you might not be interested in (namely those that belong to the host), then run alignments against more focused databases (rRNA, for instance, if one is doing meta-transcriptomic analyses).  Beyond that, I agree that assembly should be included early in the analysis, if it isn't already the initial step.

chris

On Oct 20, 2010, at 11:35 AM, shalabh sharma wrote:

> Hey Chris,
>              Thanks for the reply , it was really useful.
> Actually you are right, it is metagenomics sample. The thing is i've never worked with that huge amount of data, so i am trying to test some alignment programs (i am just trying to see if i can avoid blastx) so i am trying all the available programs.
> 
> Blasting 200 million reads doesn't seems a right option (may be i will go with assembly then blasting it).
> 
> Thanks
> Shalabh
> 
> 
> On Wed, Oct 20, 2010 at 11:45 AM, Chris Fields <cjfields at illinois.edu> wrote:
> On Oct 20, 2010, at 10:00 AM, shalabh sharma wrote:
> 
> > Hi All,
> >        Is there any module for mummer in Bioperl?
> >
> > Also i need some suggestions and ideas (i think this is the best place to
> > ask).
> > I am working with huge data (around 200 million illumina reads), earlier i
> > was using blastx and other similar approaches to annotate but now i think
> > thats not possible, i would be very grateful if anyone can give me some idea
> > regarding this.
> >
> > Thanks
> > Shalabh
> 
> Hard to say unless we know a little more about what you are attempting to do.  Not sure why you are using mummer here, but...
> 
> This is something fairly well-covered in the literature for most use cases, and on places like seqanswers.  If you are doing something like aligning reads to reference genome(s) or set of gene models, you should be using something like bowtie/tophat, bwa, etc., with the output in SAM (BioPerl has perl wrappers for most of these modules).
> 
> You can also do the same for metagenome analyses, but you may need to run BLAST and convert to SAM (maybe that's what you are doing?).  The samtools package comes with perl scripts to do that and can be further used to sort the matches, convert/index a BAM file for fast accession, etc.  From there you can then use tools like Bio::DB::SAM, R/BioConductor/RSamtools, or similar to access the sequences, find coverage statistics, run SNP calls, etc.
> 
> And, for the record, we do have an experimental mummer parser, but I believe it lies in a branch at the moment (don't think it has been merged yet):
> 
> http://github.com/bioperl/bioperl-live/tree/topic/bug-2701
> 
> chris
> 





More information about the Bioperl-l mailing list