[BioPython] Reading Roche 454 binary SFF files in Python

Peter biopython at maubp.freeserve.co.uk
Tue Apr 14 16:36:09 EDT 2009


Jose Blanca wrote this interesting reply (which I assume he meant to
send to the whole mailing list, not just me):

On 4/14/09, Blanca Postigo Jose Miguel <jblanca at btc.upv.es> wrote:
>
>  > For those that didn't know, the Roche 454 off instrument applications
>  > (available on Linux only I believe) include a command line tool called
>  > "sffinfo" which can convert a binary SFF file into FASTA (using the
>  > command line option -s or -seq) or QUAL format using PHRED qualities
>  > (command line option -q or -qual).  I've been using this myself to get
>  > some Roche 454 SFF read data into Bio.SeqIO in order to manually trim
>  > off primer sequences.
>
> For the ones that do not have the 454 software there's a free software
>  alternative. Some time ago Bastien Chevreux and I created a little utility to
>  convert sff files to fasta and xml (for the ancilliary info). It's called
>  sff_extract, is written in python and released under the GPL.
>  You can get the python script here:
>  http://bioinf.comav.upv.es/sff_extract/index.html
>  Maybe I should have announce it here, but I didn't, my fault.
>  If you think this code could be of some interest for you I could talk with
>  Bastien about the possibility of submitting it to biopython. Although in that
>  case it could use some cleaning, it works, but it could be nicer.
>
>  Best regards,
>
>  Jose Blanca

That does sound interesting - if you want I, email me a proper release
announcement and I can forward it to the Biopython announcement
mailing list.

I was aware that some information was available about the SFF file
format, and it should be possible to reverse engineer the format in
order to read and write it directly from Biopython.

Right now with your code under the GPL, we can't incorporate it into
Biopython, but if you and Bastien are prepared to offer it to
Biopython under our MIT/BSD licence that could be very useful.  Even
without that, any documentation on the file format or example files
you might be able to share could be valuable.

I felt that adding FASTQ and QUAL support to Biopython should come
first, but since the Bio.SeqIO framework is extendible perhaps we
could add native support for SFF files to Biopython later on.  Given
people can use the Roche 454 tools (if they have them) or your open
source sff_extract to get the data out of an SFF file, this isn't
urgent, but is worth thinking about :)

Peter

P.S. Have you tested your sff_extract software on SFF files from the
new Roche v2 software, released about the same time as the "titanium"
454 upgrade?


More information about the Biopython mailing list