[Bioperl-l] question temp files in blast

Francisco J. Ossandón fossandonc at hotmail.com
Wed Dec 11 12:57:19 UTC 2013


Hello Dimitar, 
You expect to have several instances of the script running at the same
time??

If there is only 1 instance for the script, it could be easier to assign an
increasing counter for the smaller fastas (seq1.fa, seq2.fa... seqX.fa), and
then use the fasta filename as base for the blast output filename
(seq1.blastout.txt, seq2.blastout.txt... seqX.blastout.txt).

If there are multiple instances, you could add to the filename the original
fasta name and the 'time' function return value (I think it would be
unlikely to process 2 files with the same name and starting at the same
time). Something like:

my $in_file = 'original.fa';
my $time = time;
my $counter = 0;
foreach my $fasta_piece (@fasta_pieces) {
	$counter++;
	my ($file_out) = ($file_in =~ m/^(.+)\.fa$/i);
	$file_out = ".$time.seq$counter.fa"; # Resulting in 'original.
1386766006.seq1.fa'

	my ($blast_result) = ($file_out =~ m/^(.+)\.fa$/i);
	$blast_result .= '.blast_out.txt'; # Resulting in 'original.
1386766006.seq1.blast_out.txt'
}

That would add some specificity (temporal files with same base name) and
some randomness (counter and execution time). The filenames can be a little
long but I like it because all files are grouped by their base name, so I
can list/copy/move/delete them together.

Or maybe that's not enough for you needs??

Cheers,

Francisco J. Ossandon

-----Mensaje original-----
De: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de
dimitark at bii.a-star.edu.sg
Enviado el: martes, 10 de diciembre de 2013 22:54
Para: bioperl-l at lists.open-bio.org
Asunto: [Bioperl-l] question temp files in blast

Hi guys,
i have a question about StandAloneBlastPlus and File::Temp.

I encountered a problem which arises from File::Temp in my particular
script. In previous email i said i forced StandAloneBLastPLus to accept a
TEMP_DIR which i give through modifying BlastMethods.pm and
StandAloneBlastPlus.pm. This works but not always and that is because
File::Temp is using the built in perl function rand() which uses srand().

Now in brief: my script is splitting a large FASTA into smaller ones and for
each of the smaller ones is starting a new thread of BLAST with as many
threads as desired. Also is creating a special TEMP_DIR for each thread in
which the temp blast files are stored: file.fas and the blast_result.
However because of the rand() some clashing of file names occurs because
there is not enough randomness and some of my threads die, not always but
very often.

So my question is the following. Should i try to modify BlastMethods.pm and
StandAloneBlastPlus.pm further so that i can manually specify the file names
of the temp files or to use another module like  Math::Random::Secure in
order to produce a really random number which i can then pass to srand()
after i create my threads so that there is no temp file names clashing?

The easiest is to just use additional module but then more dependencies just
for one random number. On the other hand if i modify the current modules i
will be sure that there wont be a chance to have temp file name clashing at
all and no further dependencies.

I am sorry if my email seems too messy but i tried to put it really brief.

Any advice is welcomed!

Thank you for your time

Cheers
Dimitar


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list