[Bioperl-l] New commits

Tony Cox avc@sanger.ac.uk
Wed, 31 Oct 2001 13:24:36 +0000 (GMT)


I'm a new contributor to bioperl but have used it a lot for stuff at the Sanger
Centre.

We have a need to manipulate trace files and their quality data in "fastq"
format. I have therefore just committed a new SeqIO/fastq module and an
Index/Fastq modules for parsing,writing and indexing them. These use the
SeqWithQuality object.

I made a few very minor tweaks to SeqWithQuality and added an extra call the
SeqIO/raw so that a raw stream can write quality data.

I ran the test suite and there were no errors in the new modules although there
are a number in other places. I guess people know about these?

FYI fastq format entries run thus:

@sequence_id <option description>
atcgatcgatgctacgtacgtatctagctacgactg.....
+sequence_id <option description>
!''''''+,41+*)(*(()-%%%((+,56<<;;;>I.....

DNA sequence/quality data is all on a single line. The quality values are byte
encoded to save space so unpacking each byte and subtracting 33 will yield the
decimal quality value.

The format is still not set in stone so things may change slightly (the most
likely being allowing DNA/quality value lines to wrap ala fasta). Removing the
redundant quality ID line is also likely.

Tony

******************************************************
Tony Cox			Email:avc@sanger.ac.uk
Sanger Institute		WWW:www.sanger.ac.uk
Wellcome Trust Genome Campus	Webmaster
Hinxton				Tel: +44 1223 834244
Cambs. CB10 1SA			Fax: +44 1223 494919
******************************************************