[Bioperl-l] QUERY: Correct use of Bio::SeqIO::largefasta?

Dave Messina David.Messina at sbc.su.se
Mon Jul 27 06:13:27 EDT 2009


Hi Mark,
I'm not familiar with the TFBS code, but these errors you're getting:

Output from 1.7 Megabase DNA fragment:
>
> GET_SEQUENCE:  Sequence too long.
> LOOP_ON_SEQS:  get_sequence failed.
> MAIN:  loop_on_seqs failed.


are coming from the C code that's part of the TFBS package.

So I think your inclination to chop up the longer sequences is correct. I
modified your code to do that (see below).

Also, you might contact the author of the TFBS package (Boris.Lenhard at
bccs.uib.no) and see if he has any suggestions for handling longer sequences
with his code.



Dave

------- CUT HERE ------
#!/usr/bin/perl

use strict;
use warnings;
use TFBS::Matrix::PFM;
use Bio::Seq;
use Bio::SeqIO;

my $matrixref = [
    [ 5,  5,  5,  5,  5,  5,  85, 5,  5,  5,  5,  85 ],
    [ 5,  5,  5,  5,  5,  85, 5,  5,  5,  5,  85, 5 ],
    [ 5,  5,  85, 85, 5,  5,  5,  85, 5,  85, 5,  5 ],
    [ 85, 85, 5,  5,  85, 5,  5,  5,  85, 5,  5,  5 ]
];

my $chunklength = 175000;  # set this to whatever you want your chunk size
to be

my $pfm = TFBS::Matrix::PFM->new(
    -matrix => $matrixref,
    -name   => "CeRep_matrix_1",
    -ID     => "M1000"
);

my $pwm = $pfm->to_PWM();    # convert to position weight matrix

my $stream = Bio::SeqIO->new(
    -format => 'fasta',
    -fh     => \*ARGV
);

while ( my $seq = $stream->next_seq() ) {
    my $seqlength = $seq->length();

    for ( my $i = 1 ; $i <= $seqlength ; $i += $chunklength ) {
        my $start = $i;
        my $end   = $start + $chunklength - 1;
        if ( $end > $seqlength ) { $end = $seqlength; }

        my $subseq = $seq->subseq( $start, $end );

        my $display_id = $seq->display_id . '/' . $start . '-' . $end;

        my $chunk = Bio::Seq->new(
            -seq        => $subseq,
            -display_id => $display_id,
            -alphabet   => $seq->alphabet,
        );

        my $siteset = $pwm->search_seq(
-seqobj => $chunk,
 -threshold => " 75 %",
);
        print $siteset->GFF();
    }
}



More information about the Bioperl-l mailing list