[Biojava-dev] [Biojava-l] FASTA Header Parser

Hannes Brandstätter-Müller biojava at hannes.oib.com
Sat Feb 25 05:49:32 UTC 2012


Hi!

I just looked over the code. Just to verify my understanding: You do
not parse the quality scores in any way (except checking if all chars
are in the correct range)? I would need direct access to the concrete
score values of each position for my project, so I am thinking about

1) enhancing the DNASequence (and the Compound classes, to be more
exact) to be able to hold quality information
2) enhancing the fastq reader and writer to be able to deal with
(input/output) the 3.0 DNASequence classes
3) implement a reader for FASTA/QUAL format too

As a side effect, you could use the route via the DNASequence to
translate from Illumina to Sanger to Solexa format (which, as far as I
understand it, is not supported yet)

Hannes

On Fri, Jan 13, 2012 at 06:30, Michael Heuer <heuermh at gmail.com> wrote:
> Thanks, Scooter.
>
> I committed to a new module biojava3-sequencing since I wasn't sure
> where in biojava3-genome the new package should go.  I saw an io
> package with feature readers, but fastq is more a sequencing format.
> Feel free to move it around if you can think of a better place for it.
>
> I'll need a day or two to become more familiar with the biojava3 core
> before I can add the static helper method.  I'm still a 1.x guy I
> guess.
>
>   michael
>
>
> On Thu, Jan 12, 2012 at 11:21 AM, Scooter Willis <HWillis at scripps.edu> wrote:
>> I think Git is a read only copy.
>>
>> Can you do a Static Helper method in the same way we have
>> FastaReaderHelper in the core. This way we can hide the implementation
>> details. You can also map the QC attributes to the sequence as meta data
>> and set the original header. This way we can write back exact format as
>> written. If you get the code into genomics I can also add some additional
>> features.
>>
>>
>>
>> On 1/12/12 12:15 PM, "Michael Heuer" <heuermh at gmail.com> wrote:
>>
>>>Thanks, will do this evening, since I don't have ssh access at work
>>>for svn+ssh.  Or is it possible to commit to the git repository?
>>>
>>>  michael
>>>
>>>
>>>On Thu, Jan 12, 2012 at 11:11 AM, Scooter Willis <HWillis at scripps.edu>
>>>wrote:
>>>> Michael
>>>>
>>>> You can put the source in the genomics module. At some point we can
>>>> probably use the same code in core for a sequence proxy loader option so
>>>> that you can load huge fastq files with a lazy loading of the sequences.
>>>> This way you don't burn through memory.
>>>>
>>>> Scooter
>>>>
>>>> On 1/12/12 12:05 PM, "Michael Heuer" <heuermh at gmail.com> wrote:
>>>>
>>>>>Hannes Brandstätter-Müller wrote:
>>>>>> On Wed, Jan 11, 2012 at 22:34, Michael Heuer <heuermh at gmail.com>
>>>>>>wrote:
>>>>>>> Hannes Brandstätter-Müller wrote:
>>>>>>>> On Wed, Jan 11, 2012 at 16:24, Scooter Willis <HWillis at scripps.edu>
>>>>>>>>wrote:
>>>>>>>>> Is this a custom header or something output from a sequencing
>>>>>>>>> instrument/software?
>>>>>>>>
>>>>>>>> It's the output of the Roche/454 Titanium FLX Sequencer
>>>>>>>
>>>>>>> If you would rather, biojava has support for the FASTQ file format,
>>>>>>
>>>>>> we already had this discussion this or last month. FASTQ support is
>>>>>> still not in 3.0, only in 1.8. I will work on porting that to 3.0 and
>>>>>> supporting FASTA/QUAL too, most likely, but I will have time for that
>>>>>> starting in April, after this project that I'm working on currently is
>>>>>> finished.
>>>>>
>>>>>There is no reason to port, the fastq package in biojava-legacy is
>>>>>completely self-contained.  If having it in a separate jar is really a
>>>>>problem, I could copy it over to the trunk for the next release of
>>>>>biojava3.
>>>>>
>>>>>   michael
>>>>>
>>>>>_______________________________________________
>>>>>biojava-dev mailing list
>>>>>biojava-dev at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>




More information about the biojava-dev mailing list