[Bioperl-l] need help ??parse AcNum from fasta?

Wed Oct 3 00:50:37 UTC 2007

http://bioperl.open-bio.org/wiki/ 
FAQ#How_do_I_use_Bio::Index::Fasta_and_index_on_different_ids.3F

On Oct 2, 2007, at 7:51 AM, outaleb Issame wrote:

> hi again,
> i think i can resolve this problem with the method : id_parser();
> how can i do that?
> any suggestion .or experience??
> ehx again
>
>
>
> outaleb Issame wrote:
>
>> thx for the help, but i got a empty output file,
>> i think its problem with matching the acc number, my fasta file  
>> look like:
>>
>> *>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to  
>> NOD3 protein
>> DDHHHU...
>>> IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3  
>>> protein
>> DDHHHU..
>>> IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3  
>>> protein
>> MMMMM..*
>>
>> and my i Accnum File look like:
>> *IPI00177321
>> IPI00453473
>>
>> *i hopt it helps to understand.*
>> *.
>>
>>
>> Nathan S. Haigh wrote:
>>
>>
>>
>>> outaleb Issame wrote:
>>>
>>>
>>>
>>>
>>>> hi,
>>>> with this file i mean, i picked out this Accession Number from
>>>> IPI-Human Dbase,they come from a fasta file,
>>>> so they re under eachother like a i a table in separate file now.
>>>> what i want is how how can i check it in the fasta File (so in the
>>>> IPI-Human FAsta File), i they re really there;
>>>> if yes please copy the entire entry of this Number (>....the  
>>>> sequence
>>>> also)in new fasta file.so that i get at the end a new
>>>> FASTA file with jus this IPI Accession Number.
>>>> thx and hope was clearly.
>>>>
>>>>
>>>>
>>>>
>>> Ok, first of all, I'd read the contents of your Accession numbers  
>>> into a
>>> hash, something like the following (this could be written in a  
>>> shorter
>>> form, but since you're a newbie I'll leave it in a longer form so  
>>> you
>>> can follow easier).
>>>
>>> -- start script --
>>> use strict;
>>> use Bio::SeqIO;
>>>
>>> # change the following three lines to point to the relevant paths
>>> # of your list of accessions file, your fasta file and your output
>>> # fasta file
>>> my $acc_file = "/path/to/your/file";
>>> my $fasta_file_in = "/path/to/your/fasta/file";
>>> my $fasta_file_out = "/path/to/your/fasta/output/file";
>>>
>>> # Use a hash to keep a record of accessions we want to find
>>> my %hash_of_req_acc;
>>>
>>> # read all the required accessions from the file into the hash as  
>>> keys
>>> open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n";
>>> while (<ACC_FILE>) {
>>> my $line = $_;
>>> chomp $line;
>>> $hash_of_req_acc{$_} = 1;
>>> }
>>> close ACC_FILE;
>>>
>>> my $seqio_object_in = Bio::SeqIO->new(
>>> -file => $fasta_file_in,
>>> -format => 'fasta'
>>> );
>>> my $seqio_object_out = Bio::SeqIO->new(
>>> -file => $fasta_file_out,
>>> -format => 'fasta'
>>> );
>>>
>>> # loop through all the sequences in the fasta file
>>> while (my $seq_object = $seqio_object_in->next_seq) {
>>> # get the sequence accession for easy matching
>>> my $seq_acc = $seq_object->accession_number;
>>>
>>> # write the sequence object to the output fasta file if we have a
>>> matching accession
>>> $seqio_object_out->write_seq($seq_object) if exists
>>> $hash_of_req_acc{$seq_acc};
>>> }
>>> -- end script --
>>>
>>> I haven't tested this, but it should at least get you started.  
>>> Also, the
>>> fasta description line in the output file may not be exactly as  
>>> it was
>>> in the input fasta file - if this really matters, you may need to  
>>> get
>>> back to us. Also, if the input fasta file is huge (many thousands of
>>> sequences) it may be wise to create an index of the fasta file in  
>>> order
>>> to speed up retrieval.
>>>
>>> You may find this page helpful:
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>>
>>> Anyway, hope this helps to get you started.
>>> Nath
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org