[Bioperl-l] script

James Wasmuth james.wasmuth at ed.ac.uk
Thu Sep 4 12:45:05 EDT 2003


Sorry to be anally retentive, but just in case it is on a Windows 
machine, and the fasta file has funny carriage return formatting which 
may not picked up with ' \n '  use;

open IN, "<file.da" or die;

while (<IN>)    { $lines .= $_; }

$count++ while ($lines=~m/^>/g);

print "No. of seqs: ", $count, "\n";


I think all bases have been covered now, except loading it into BioSeqIO...
:-p

james


Andreas Kahari wrote:

>If it's not a Unix system, this [untested] Perl snippet will do
>approximately the same thing:
>
>$/ = "\n>";
>$count = 0;
>
>open(IN, "file.fa") or die;
>while (<IN>) { $count++ }
>close(IN);
>
>print "No. of seqs: ", $count, "\n";
>
>
>On Thu, Sep 04, 2003 at 05:12:21PM +0100, James Wasmuth wrote:
>  
>
>>If its a standard FASTA format file, then at the command line prompt type:
>>
>>grep ">" file.fa | wc -l
>>
>>hth
>>james
>>
>>Lobvi Matamoros wrote:
>>
>>    
>>
>>>Hi:
>>>
>>>Does any one have an script to count how many proteins do you have in 
>>>a database/file in FASTA format
>>>
>>>Thanks in advance for your help
>>>      
>>>
>[cut]
>
>  
>

-- 

Nematode Bioinformatics
Blaxter Nematode Genomics Group
Institute of Cell, Animal and Population Biology
Ashworth Labs						
University of Edinburgh				
King's Buildings					
Edinburgh			
EH9 3JT	 			
UK					

(+44)(0)131 650 7403





More information about the Bioperl-l mailing list