[Biojava-dev] [Biojava-l] FASTA Header Parser

Hannes Brandstätter-Müller biojava at hannes.oib.com
Sat Feb 25 13:06:23 UTC 2012


Scooter,

thanks, that will cover my 1st point quite well, I think.

I will work on 2) and 3) then…

Hannes

On Sat, Feb 25, 2012 at 12:55, Scooter Willis <HWillis at scripps.edu> wrote:
> Hannes
>
> You can currently add arbitrary features to a sequence based on position
> which should allow you to store quality information. You could create a
> quality feature that goes from the start to finish of the sequence and
> then in the feature retain an array for the quality scores.
>
> Scooter
>
> On 2/25/12 12:49 AM, "Hannes Brandstätter-Müller" <biojava at hannes.oib.com>
> wrote:
>
>>Hi!
>>
>>I just looked over the code. Just to verify my understanding: You do
>>not parse the quality scores in any way (except checking if all chars
>>are in the correct range)? I would need direct access to the concrete
>>score values of each position for my project, so I am thinking about
>>
>>1) enhancing the DNASequence (and the Compound classes, to be more
>>exact) to be able to hold quality information
>>2) enhancing the fastq reader and writer to be able to deal with
>>(input/output) the 3.0 DNASequence classes
>>3) implement a reader for FASTA/QUAL format too
>>
>>As a side effect, you could use the route via the DNASequence to
>>translate from Illumina to Sanger to Solexa format (which, as far as I
>>understand it, is not supported yet)
>>
>>Hannes
>>
>>On Fri, Jan 13, 2012 at 06:30, Michael Heuer <heuermh at gmail.com> wrote:
>>> Thanks, Scooter.
>>>
>>> I committed to a new module biojava3-sequencing since I wasn't sure
>>> where in biojava3-genome the new package should go.  I saw an io
>>> package with feature readers, but fastq is more a sequencing format.
>>> Feel free to move it around if you can think of a better place for it.
>>>
>>> I'll need a day or two to become more familiar with the biojava3 core
>>> before I can add the static helper method.  I'm still a 1.x guy I
>>> guess.
>>>
>>>   michael
>>>
>>>
>>> On Thu, Jan 12, 2012 at 11:21 AM, Scooter Willis <HWillis at scripps.edu>
>>>wrote:
>>>> I think Git is a read only copy.
>>>>
>>>> Can you do a Static Helper method in the same way we have
>>>> FastaReaderHelper in the core. This way we can hide the implementation
>>>> details. You can also map the QC attributes to the sequence as meta
>>>>data
>>>> and set the original header. This way we can write back exact format as
>>>> written. If you get the code into genomics I can also add some
>>>>additional
>>>> features.
>>>>
>>>>
>>>>
>>>> On 1/12/12 12:15 PM, "Michael Heuer" <heuermh at gmail.com> wrote:
>>>>
>>>>>Thanks, will do this evening, since I don't have ssh access at work
>>>>>for svn+ssh.  Or is it possible to commit to the git repository?
>>>>>
>>>>>  michael
>>>>>
>>>>>
>>>>>On Thu, Jan 12, 2012 at 11:11 AM, Scooter Willis <HWillis at scripps.edu>
>>>>>wrote:
>>>>>> Michael
>>>>>>
>>>>>> You can put the source in the genomics module. At some point we can
>>>>>> probably use the same code in core for a sequence proxy loader
>>>>>>option so
>>>>>> that you can load huge fastq files with a lazy loading of the
>>>>>>sequences.
>>>>>> This way you don't burn through memory.
>>>>>>
>>>>>> Scooter
>>>>>>
>>>>>> On 1/12/12 12:05 PM, "Michael Heuer" <heuermh at gmail.com> wrote:
>>>>>>
>>>>>>>Hannes Brandstätter-Müller wrote:
>>>>>>>> On Wed, Jan 11, 2012 at 22:34, Michael Heuer <heuermh at gmail.com>
>>>>>>>>wrote:
>>>>>>>>> Hannes Brandstätter-Müller wrote:
>>>>>>>>>> On Wed, Jan 11, 2012 at 16:24, Scooter Willis
>>>>>>>>>><HWillis at scripps.edu>
>>>>>>>>>>wrote:
>>>>>>>>>>> Is this a custom header or something output from a sequencing
>>>>>>>>>>> instrument/software?
>>>>>>>>>>
>>>>>>>>>> It's the output of the Roche/454 Titanium FLX Sequencer
>>>>>>>>>
>>>>>>>>> If you would rather, biojava has support for the FASTQ file
>>>>>>>>>format,
>>>>>>>>
>>>>>>>> we already had this discussion this or last month. FASTQ support is
>>>>>>>> still not in 3.0, only in 1.8. I will work on porting that to 3.0
>>>>>>>>and
>>>>>>>> supporting FASTA/QUAL too, most likely, but I will have time for
>>>>>>>>that
>>>>>>>> starting in April, after this project that I'm working on
>>>>>>>>currently is
>>>>>>>> finished.
>>>>>>>
>>>>>>>There is no reason to port, the fastq package in biojava-legacy is
>>>>>>>completely self-contained.  If having it in a separate jar is really
>>>>>>>a
>>>>>>>problem, I could copy it over to the trunk for the next release of
>>>>>>>biojava3.
>>>>>>>
>>>>>>>   michael
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>biojava-dev mailing list
>>>>>>>biojava-dev at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>
>>
>>_______________________________________________
>>biojava-dev mailing list
>>biojava-dev at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/biojava-dev
>




More information about the biojava-dev mailing list