[Bioperl-l] need help ??parse AcNum from fasta?

outaleb Issame outaleb at web.de
Tue Oct 2 14:51:05 UTC 2007


hi again,
i think i can resolve this problem with the method : id_parser();
how can i do that?
any suggestion .or experience??
ehx again



outaleb Issame wrote:

>thx for the help, but i got a empty output file,
>i think its problem with matching the acc number, my fasta file look like:
>
>*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein
>DDHHHU...
> >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein
>DDHHHU..
> >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein
>MMMMM..*
>
>and my i Accnum File look like:
>*IPI00177321
>IPI00453473
>
>*i hopt it helps to understand.*
>*.
>
>
>Nathan S. Haigh wrote:
>
>  
>
>>outaleb Issame wrote:
>> 
>>
>>    
>>
>>>hi,
>>>with this file i mean, i picked out this Accession Number from
>>>IPI-Human Dbase,they come from a fasta file,
>>>so they re under eachother like a i a table in separate file now.
>>>what i want is how how can i check it in the fasta File (so in the
>>>IPI-Human FAsta File), i they re really there;
>>>if yes please copy the entire entry of this Number (>....the sequence
>>>also)in new fasta file.so that i get at the end a new
>>>FASTA file with jus this IPI Accession Number.
>>>thx and hope was clearly.
>>>   
>>>
>>>      
>>>
>>Ok, first of all, I'd read the contents of your Accession numbers into a
>>hash, something like the following (this could be written in a shorter
>>form, but since you're a newbie I'll leave it in a longer form so you
>>can follow easier).
>>
>>-- start script --
>>use strict;
>>use Bio::SeqIO;
>>
>># change the following three lines to point to the relevant paths
>># of your list of accessions file, your fasta file and your output
>># fasta file
>>my $acc_file = "/path/to/your/file";
>>my $fasta_file_in = "/path/to/your/fasta/file";
>>my $fasta_file_out = "/path/to/your/fasta/output/file";
>>
>># Use a hash to keep a record of accessions we want to find
>>my %hash_of_req_acc;
>>
>># read all the required accessions from the file into the hash as keys
>>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n";
>>while (<ACC_FILE>) {
>> my $line = $_;
>> chomp $line;
>> $hash_of_req_acc{$_} = 1;
>>}
>>close ACC_FILE;
>>
>>my $seqio_object_in = Bio::SeqIO->new(
>> -file => $fasta_file_in,
>> -format => 'fasta'
>>);
>>my $seqio_object_out = Bio::SeqIO->new(
>> -file => $fasta_file_out,
>> -format => 'fasta'
>>);
>>
>># loop through all the sequences in the fasta file
>>while (my $seq_object = $seqio_object_in->next_seq) {
>> # get the sequence accession for easy matching
>> my $seq_acc = $seq_object->accession_number;
>>
>> # write the sequence object to the output fasta file if we have a
>>matching accession
>> $seqio_object_out->write_seq($seq_object) if exists
>>$hash_of_req_acc{$seq_acc};
>>}
>>-- end script --
>>
>>I haven't tested this, but it should at least get you started. Also, the
>>fasta description line in the output file may not be exactly as it was
>>in the input fasta file - if this really matters, you may need to get
>>back to us. Also, if the input fasta file is huge (many thousands of
>>sequences) it may be wise to create an index of the fasta file in order
>>to speed up retrieval.
>>
>>You may find this page helpful:
>>http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>>Anyway, hope this helps to get you started.
>>Nath
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> 
>>
>>    
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>




More information about the Bioperl-l mailing list