[Bioperl-l] indexed fastq files

Fri Feb 26 08:28:02 EST 2010

Sure, go ahead.  I can look at adding tests for this module as well.

chris

On Feb 26, 2010, at 1:20 AM, Albert Vilella wrote:

> Hi all, would it be fine if I add an offset option to get this seek() to work?
> Bio/Index/AbstractSeq.pm:131
> 
> sub fetch {
> 	my( $self, $id, $db_file_offset ) = @_;
> 	my $db = $self->db();
> 	my $seq;
> 
> 	if (my $rec = $db->{ $id }) {
> 		my ($file, $begin) = $self->unpack_record( $rec );
> 
> 		# Get the (possibly cached) SeqIO object
> 		my $seqio = $self->_get_SeqIO_object( $file );
> 		my $fh = $seqio->_fh();
> 
> 		# move to start of record
> 		# $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
> 		$begin-- if(defined($db_file_offset)); # avilella 20100224
> 		seek($fh, $begin, 0);
> 
> 		$seq = $seqio->next_seq();	
> 	}
> 
> 	# we essentially assumme that the primary_id for the database
> 	# is the display_id
> 	if (ref($seq) && $seq->isa('Bio::PrimarySeqI') &&
> 		 $seq->primary_id =~ /^\D+$/) {
> 		$seq->primary_id( $seq->display_id() );
> 	}
> 	return $seq;
> }
> 
> 
> On Wed, Feb 24, 2010 at 11:45 AM, Albert Vilella <avilella at gmail.com> wrote:
>> BTW, I should mention that my index file was created with this options
>> on the same linux system:
>> 
>> my $db  = Bio::Index::Fastq->new(-filename => $fastafile,
>> -dbm_package=>'DB_File');
>> 
>> So it looks more like DB_File dependent than "Win DB_File"...
>> 
>> On Wed, Feb 24, 2010 at 11:32 AM, Albert Vilella <avilella at gmail.com> wrote:
>>> Hi Chris,
>>> 
>>> I am finding that Bio::Index::Fastq seek is chopping off the first
>>> character of the fastq entry. I'm on Linux using bioperl-1.6.1 and
>>> debugged the problem to this point in AbstractSeq.pm:143, where there
>>> is this funny commented line:
>>> 
>>>               # $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
>>>               seek($fh, $begin, 0);
>>> 
>>> If I apply this $begin--, everything works fine, but I am not using
>>> windows, I am on a Linux cluster.
>>> 
>>> Any ideas why this was tagged as a "Win DB_File bug"?
>>> 
>>> Cheers,
>>> 
>>> Albert.
>>> 
>>> On Mon, Jan 4, 2010 at 9:59 PM, Chris Fields <cjfields at illinois.edu> wrote:
>>>> Bio::Index::Fastq, maybe?  To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.
>>>> 
>>>> chris
>>>> 
>>>> On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> What is the best way to index fastq files, so that once clustered, I
>>>>> can provide a list of seq_ids and get
>>>>> them back in fastq format from the indexed db?
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Albert.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> 
>>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l