[Bioperl-l] indexed fastq files
Chris Fields
cjfields at illinois.edu
Fri Feb 26 08:28:02 EST 2010
Sure, go ahead. I can look at adding tests for this module as well.
chris
On Feb 26, 2010, at 1:20 AM, Albert Vilella wrote:
> Hi all, would it be fine if I add an offset option to get this seek() to work?
> Bio/Index/AbstractSeq.pm:131
>
> sub fetch {
> my( $self, $id, $db_file_offset ) = @_;
> my $db = $self->db();
> my $seq;
>
> if (my $rec = $db->{ $id }) {
> my ($file, $begin) = $self->unpack_record( $rec );
>
> # Get the (possibly cached) SeqIO object
> my $seqio = $self->_get_SeqIO_object( $file );
> my $fh = $seqio->_fh();
>
> # move to start of record
> # $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
> $begin-- if(defined($db_file_offset)); # avilella 20100224
> seek($fh, $begin, 0);
>
> $seq = $seqio->next_seq();
> }
>
> # we essentially assumme that the primary_id for the database
> # is the display_id
> if (ref($seq) && $seq->isa('Bio::PrimarySeqI') &&
> $seq->primary_id =~ /^\D+$/) {
> $seq->primary_id( $seq->display_id() );
> }
> return $seq;
> }
>
>
> On Wed, Feb 24, 2010 at 11:45 AM, Albert Vilella <avilella at gmail.com> wrote:
>> BTW, I should mention that my index file was created with this options
>> on the same linux system:
>>
>> my $db = Bio::Index::Fastq->new(-filename => $fastafile,
>> -dbm_package=>'DB_File');
>>
>> So it looks more like DB_File dependent than "Win DB_File"...
>>
>> On Wed, Feb 24, 2010 at 11:32 AM, Albert Vilella <avilella at gmail.com> wrote:
>>> Hi Chris,
>>>
>>> I am finding that Bio::Index::Fastq seek is chopping off the first
>>> character of the fastq entry. I'm on Linux using bioperl-1.6.1 and
>>> debugged the problem to this point in AbstractSeq.pm:143, where there
>>> is this funny commented line:
>>>
>>> # $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
>>> seek($fh, $begin, 0);
>>>
>>> If I apply this $begin--, everything works fine, but I am not using
>>> windows, I am on a Linux cluster.
>>>
>>> Any ideas why this was tagged as a "Win DB_File bug"?
>>>
>>> Cheers,
>>>
>>> Albert.
>>>
>>> On Mon, Jan 4, 2010 at 9:59 PM, Chris Fields <cjfields at illinois.edu> wrote:
>>>> Bio::Index::Fastq, maybe? To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.
>>>>
>>>> chris
>>>>
>>>> On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> What is the best way to index fastq files, so that once clustered, I
>>>>> can provide a list of seq_ids and get
>>>>> them back in fastq format from the indexed db?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Albert.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list