[Bioperl-l] indexed fastq files
Albert Vilella
avilella at gmail.com
Fri Feb 26 02:20:27 EST 2010
Hi all, would it be fine if I add an offset option to get this seek() to work?
Bio/Index/AbstractSeq.pm:131
sub fetch {
my( $self, $id, $db_file_offset ) = @_;
my $db = $self->db();
my $seq;
if (my $rec = $db->{ $id }) {
my ($file, $begin) = $self->unpack_record( $rec );
# Get the (possibly cached) SeqIO object
my $seqio = $self->_get_SeqIO_object( $file );
my $fh = $seqio->_fh();
# move to start of record
# $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
$begin-- if(defined($db_file_offset)); # avilella 20100224
seek($fh, $begin, 0);
$seq = $seqio->next_seq();
}
# we essentially assumme that the primary_id for the database
# is the display_id
if (ref($seq) && $seq->isa('Bio::PrimarySeqI') &&
$seq->primary_id =~ /^\D+$/) {
$seq->primary_id( $seq->display_id() );
}
return $seq;
}
On Wed, Feb 24, 2010 at 11:45 AM, Albert Vilella <avilella at gmail.com> wrote:
> BTW, I should mention that my index file was created with this options
> on the same linux system:
>
> my $db = Bio::Index::Fastq->new(-filename => $fastafile,
> -dbm_package=>'DB_File');
>
> So it looks more like DB_File dependent than "Win DB_File"...
>
> On Wed, Feb 24, 2010 at 11:32 AM, Albert Vilella <avilella at gmail.com> wrote:
>> Hi Chris,
>>
>> I am finding that Bio::Index::Fastq seek is chopping off the first
>> character of the fastq entry. I'm on Linux using bioperl-1.6.1 and
>> debugged the problem to this point in AbstractSeq.pm:143, where there
>> is this funny commented line:
>>
>> # $begin-- if( $^O =~ /mswin/i); # workaround for Win DB_File bug
>> seek($fh, $begin, 0);
>>
>> If I apply this $begin--, everything works fine, but I am not using
>> windows, I am on a Linux cluster.
>>
>> Any ideas why this was tagged as a "Win DB_File bug"?
>>
>> Cheers,
>>
>> Albert.
>>
>> On Mon, Jan 4, 2010 at 9:59 PM, Chris Fields <cjfields at illinois.edu> wrote:
>>> Bio::Index::Fastq, maybe? To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.
>>>
>>> chris
>>>
>>> On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:
>>>
>>>> Hi all,
>>>>
>>>> What is the best way to index fastq files, so that once clustered, I
>>>> can provide a list of seq_ids and get
>>>> them back in fastq format from the indexed db?
>>>>
>>>> Cheers,
>>>>
>>>> Albert.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>
More information about the Bioperl-l
mailing list