[Bioperl-l] indexed fastq files

Fri Feb 26 07:20:27 UTC 2010

Hi all, would it be fine if I add an offset option to get this seek() to work?
Bio/Index/AbstractSeq.pm:131

sub fetch {
	my( $self, $id, $db_file_offset ) = @_;
	my $db = $self->db();
	my $seq;

	if (my $rec = $db->{ $id }) {
		my ($file, $begin) = $self->unpack_record( $rec );

		# Get the (possibly cached) SeqIO object
		my $seqio = $self->_get_SeqIO_object( $file );
		my $fh = $seqio->_fh();

		# move to start of record
		# $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
		$begin-- if(defined($db_file_offset)); # avilella 20100224
		seek($fh, $begin, 0);

		$seq = $seqio->next_seq();	
	}

	# we essentially assumme that the primary_id for the database
	# is the display_id
	if (ref($seq) && $seq->isa('Bio::PrimarySeqI') &&
		 $seq->primary_id =~ /^\D+$/) {
		$seq->primary_id( $seq->display_id() );
	}
	return $seq;
}

On Wed, Feb 24, 2010 at 11:45 AM, Albert Vilella <avilella at gmail.com> wrote:
> BTW, I should mention that my index file was created with this options
> on the same linux system:
>
> my $db  = Bio::Index::Fastq->new(-filename => $fastafile,
> -dbm_package=>'DB_File');
>
> So it looks more like DB_File dependent than "Win DB_File"...
>
> On Wed, Feb 24, 2010 at 11:32 AM, Albert Vilella <avilella at gmail.com> wrote:
>> Hi Chris,
>>
>> I am finding that Bio::Index::Fastq seek is chopping off the first
>> character of the fastq entry. I'm on Linux using bioperl-1.6.1 and
>> debugged the problem to this point in AbstractSeq.pm:143, where there
>> is this funny commented line:
>>
>>                # $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
>>                seek($fh, $begin, 0);
>>
>> If I apply this $begin--, everything works fine, but I am not using
>> windows, I am on a Linux cluster.
>>
>> Any ideas why this was tagged as a "Win DB_File bug"?
>>
>> Cheers,
>>
>> Albert.
>>
>> On Mon, Jan 4, 2010 at 9:59 PM, Chris Fields <cjfields at illinois.edu> wrote:
>>> Bio::Index::Fastq, maybe?  To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.
>>>
>>> chris
>>>
>>> On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:
>>>
>>>> Hi all,
>>>>
>>>> What is the best way to index fastq files, so that once clustered, I
>>>> can provide a list of seq_ids and get
>>>> them back in fastq format from the indexed db?
>>>>
>>>> Cheers,
>>>>
>>>> Albert.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>