[Biopython] sff inot fasta and qual then trim

Chris Friedline cfriedline at vcu.edu
Tue Oct 23 12:58:31 EDT 2012


Csaba,

As Peter said, there are many packages which will convert sff to fastq/fasta.  I wonder if you're running into disk performance issues, rather than algorithm ones, though.  Using the BioPython SeqIO convert should tell you that much, though that does seem slow (at least for the systems that I work on).  

Chris
  
On Oct 23, 2012, at 12:47 PM, "Kiss, Csaba" <csaba.kiss at lanl.gov> wrote:

> Hi Christopher!
> I am writing a python script to analyze antibody sequences. I have been using mothur to convert the sff files to fasta and then trim the sequences for quality. 
> For the end-users' sake, it would be easier if all they needed to install was python and can go around mothur. I have been happy with mothur until now when I tried to use it in my desktop computer and it took 3 hours to convert 3 million read from sff to fasta. I hoped that pure python would be faster. 
> I will look at Pycogent and QIIME.
> Thanks
> Csaba
> 
> -----Original Message-----
> From: Christopher Friedline [mailto:cfriedline at mymail.vcu.edu] On Behalf Of Chris Friedline
> Sent: Tuesday, October 23, 2012 10:39 AM
> To: Kiss, Csaba
> Cc: biopython at lists.open-bio.org
> Subject: Re: [Biopython] sff inot fasta and qual then trim
> 
> Are you trying to replace an entire analysis pipeline, which mothur provides, or simply take control of the read trimming routines?  Mothur has been excellent for us (though I do supplement with my own code frequently), and I have a hard time believing that BioPython (or Python, in general) would be faster for these types of things.  If you are married to Python, you may want to join in with the QIIME people, though they back their stuff with PyCogent rather than BioPython.  Both are excellent packages for automating some parts of the analysis in microbial community studies.  We can leave the philosophy of pipelining scientific research for another thread.  ;-)
> 
> I wonder if the reimplementation effort of common trimming/filtering tasks are worth your time, given the current maturity of both mothur and QIIME.
> 
> On Oct 23, 2012, at 12:04 PM, "Kiss, Csaba" <csaba.kiss at lanl.gov> wrote:
> 
>> I am new to bio-python. I am trying to replace mothur with BioPython.
>> I hope that biopython is faster than mothur. All I want to do is this:
>> 
>> sffinfo(sff=sd11.fasta)
>> trim.seqs(fasta=sd11.fasta, qfile=sd11.qual, minlength = 50, 
>> maxhomop=8, qwindowsize=50, qwindowaverage =22)
>> 
>> Can someone help me to translate the two mothur statements above to biopython, please?
>> It would be greatly appreciated.
>> thanks
>> 
>> 
>> --
>> Best Regards:
>> Csaba Kiss PhD, MSc, BSc
>> TA-43, HRL-1, MS888
>> Los Alamos National Laboratory
>> Work:    1-505-667-9898
>> Cell:   1-505-920-5774
>> 
>> 
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org 
>> http://lists.open-bio.org/mailman/listinfo/biopython
> 
> PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA
> 




More information about the Biopython mailing list