[Bioperl-l] question temp files in blast

Fields, Christopher J cjfields at illinois.edu
Wed Dec 11 14:00:09 UTC 2013


I think File::Temp generates the random file string based on the time stamp (common practice in UNIX), which rounds to the second.  Might be wrong, but that could be causing the problem, as files could be created at the same time in threads/forks. See this link, which also discusses solutions:

https://metacpan.org/pod/File::Temp#Forking  

chris

On Dec 11, 2013, at 6:57 AM, Francisco J. Ossandón <fossandonc at hotmail.com> wrote:

> Hello Dimitar, 
> You expect to have several instances of the script running at the same
> time??
> 
> If there is only 1 instance for the script, it could be easier to assign an
> increasing counter for the smaller fastas (seq1.fa, seq2.fa... seqX.fa), and
> then use the fasta filename as base for the blast output filename
> (seq1.blastout.txt, seq2.blastout.txt... seqX.blastout.txt).
> 
> If there are multiple instances, you could add to the filename the original
> fasta name and the 'time' function return value (I think it would be
> unlikely to process 2 files with the same name and starting at the same
> time). Something like:
> 
> my $in_file = 'original.fa';
> my $time = time;
> my $counter = 0;
> foreach my $fasta_piece (@fasta_pieces) {
> 	$counter++;
> 	my ($file_out) = ($file_in =~ m/^(.+)\.fa$/i);
> 	$file_out = ".$time.seq$counter.fa"; # Resulting in 'original.
> 1386766006.seq1.fa'
> 
> 	my ($blast_result) = ($file_out =~ m/^(.+)\.fa$/i);
> 	$blast_result .= '.blast_out.txt'; # Resulting in 'original.
> 1386766006.seq1.blast_out.txt'
> }
> 
> That would add some specificity (temporal files with same base name) and
> some randomness (counter and execution time). The filenames can be a little
> long but I like it because all files are grouped by their base name, so I
> can list/copy/move/delete them together.
> 
> Or maybe that's not enough for you needs??
> 
> Cheers,
> 
> Francisco J. Ossandon
> 
> -----Mensaje original-----
> De: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de
> dimitark at bii.a-star.edu.sg
> Enviado el: martes, 10 de diciembre de 2013 22:54
> Para: bioperl-l at lists.open-bio.org
> Asunto: [Bioperl-l] question temp files in blast
> 
> Hi guys,
> i have a question about StandAloneBlastPlus and File::Temp.
> 
> I encountered a problem which arises from File::Temp in my particular
> script. In previous email i said i forced StandAloneBLastPLus to accept a
> TEMP_DIR which i give through modifying BlastMethods.pm and
> StandAloneBlastPlus.pm. This works but not always and that is because
> File::Temp is using the built in perl function rand() which uses srand().
> 
> Now in brief: my script is splitting a large FASTA into smaller ones and for
> each of the smaller ones is starting a new thread of BLAST with as many
> threads as desired. Also is creating a special TEMP_DIR for each thread in
> which the temp blast files are stored: file.fas and the blast_result.
> However because of the rand() some clashing of file names occurs because
> there is not enough randomness and some of my threads die, not always but
> very often.
> 
> So my question is the following. Should i try to modify BlastMethods.pm and
> StandAloneBlastPlus.pm further so that i can manually specify the file names
> of the temp files or to use another module like  Math::Random::Secure in
> order to produce a really random number which i can then pass to srand()
> after i create my threads so that there is no temp file names clashing?
> 
> The easiest is to just use additional module but then more dependencies just
> for one random number. On the other hand if i modify the current modules i
> will be sure that there wont be a chance to have temp file name clashing at
> all and no further dependencies.
> 
> I am sorry if my email seems too messy but i tried to put it really brief.
> 
> Any advice is welcomed!
> 
> Thank you for your time
> 
> Cheers
> Dimitar
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list