[BioPython] Reading Roche 454 binary SFF files in Python

Jose Blanca jblanca at btc.upv.es
Wed Apr 15 03:07:27 EDT 2009


> I was aware that some information was available about the SFF file
> format, and it should be possible to reverse engineer the format in
> order to read and write it directly from Biopython.
The sff format is fully documented in the NCBI's SRA web site.
http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=formats&m=doc&s=formats#sff

> Right now with your code under the GPL, we can't incorporate it into
> Biopython, but if you and Bastien are prepared to offer it to
> Biopython under our MIT/BSD licence that could be very useful.  Even
> without that, any documentation on the file format or example files
> you might be able to share could be valuable.
I guess that it wouldn't be a problem to offer you the code under your 
licence. But I don't think that's the best approach. The code as it is right 
now is not well suited to be integrated in a library. It would be easier to 
rewrite the sff reading part from scratch. I could do that for you in no 
time. The main problem would be to have sff files small enough to be used for 
the test. If you could provide that I could write the code to extract the 
information from the sff file for you. It would be easy to build a generator 
able to deliver the sequences one by one.
sff_extract also is able to split the paired-ends reads. That's the part that 
Bastien wrote. Integrating that would be nice, but I think that in Biopython 
that  should be treated as an independent problem. 

> P.S. Have you tested your sff_extract software on SFF files from the
> new Roche v2 software, released about the same time as the "titanium"
> 454 upgrade?
Not me, but I think that Bastien has and he has found no problem at all with 
that. The sff format is well thought and consistent, the 454 people did a 
much better job than the ABI people did with the abi format.
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


More information about the Biopython mailing list