[Bioperl-l] need help ??parse AcNum from fasta?

Tue Oct 2 17:47:07 EDT 2007

thx for this, but i want just create new fasta file with my accNumbers 
which i search in the FASTA file(localdbase).
so --> just search this Numbers in the FASTA file, if yes then copy the 
Header and Sequence to other new fasta file .
i m sitting in this 2 days now;  i dont think it s  difficult but howww?????
i get crazy guys.
common some expert in this area??

Smithies, Russell wrote:

>I know this is the Bioperl list but how about just doing it with grep?
>
>	grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta
>
>
>
>  
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>    
>>
>[mailto:bioperl-l-bounces at lists.open-
>  
>
>>bio.org] On Behalf Of outaleb Issame
>>Sent: Wednesday, 3 October 2007 3:51 a.m.
>>To: outaleb Issame
>>Cc: bioperl-l at lists.open-bio.org
>>Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta?
>>
>>hi again,
>>i think i can resolve this problem with the method : id_parser();
>>how can i do that?
>>any suggestion .or experience??
>>ehx again
>>
>>
>>
>>outaleb Issame wrote:
>>
>>    
>>
>>>thx for the help, but i got a empty output file,
>>>i think its problem with matching the acc number, my fasta file look
>>>      
>>>
>like:
>  
>
>>>*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3
>>>      
>>>
>>protein
>>    
>>
>>>DDHHHU...
>>>      
>>>
>>>>IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3
>>>>        
>>>>
>>protein
>>    
>>
>>>DDHHHU..
>>>      
>>>
>>>>IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3
>>>>        
>>>>
>>protein
>>    
>>
>>>MMMMM..*
>>>
>>>and my i Accnum File look like:
>>>*IPI00177321
>>>IPI00453473
>>>
>>>*i hopt it helps to understand.*
>>>*.
>>>
>>>
>>>Nathan S. Haigh wrote:
>>>
>>>
>>>
>>>      
>>>
>>>>outaleb Issame wrote:
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>hi,
>>>>>with this file i mean, i picked out this Accession Number from
>>>>>IPI-Human Dbase,they come from a fasta file,
>>>>>so they re under eachother like a i a table in separate file now.
>>>>>what i want is how how can i check it in the fasta File (so in the
>>>>>IPI-Human FAsta File), i they re really there;
>>>>>if yes please copy the entire entry of this Number (>....the
>>>>>          
>>>>>
>sequence
>  
>
>>>>>also)in new fasta file.so that i get at the end a new
>>>>>FASTA file with jus this IPI Accession Number.
>>>>>thx and hope was clearly.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>Ok, first of all, I'd read the contents of your Accession numbers
>>>>        
>>>>
>into a
>  
>
>>>>hash, something like the following (this could be written in a
>>>>        
>>>>
>shorter
>  
>
>>>>form, but since you're a newbie I'll leave it in a longer form so
>>>>        
>>>>
>you
>  
>
>>>>can follow easier).
>>>>
>>>>-- start script --
>>>>use strict;
>>>>use Bio::SeqIO;
>>>>
>>>># change the following three lines to point to the relevant paths
>>>># of your list of accessions file, your fasta file and your output
>>>># fasta file
>>>>my $acc_file = "/path/to/your/file";
>>>>my $fasta_file_in = "/path/to/your/fasta/file";
>>>>my $fasta_file_out = "/path/to/your/fasta/output/file";
>>>>
>>>># Use a hash to keep a record of accessions we want to find
>>>>my %hash_of_req_acc;
>>>>
>>>># read all the required accessions from the file into the hash as
>>>>        
>>>>
>keys
>  
>
>>>>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n";
>>>>while (<ACC_FILE>) {
>>>>my $line = $_;
>>>>chomp $line;
>>>>$hash_of_req_acc{$_} = 1;
>>>>}
>>>>close ACC_FILE;
>>>>
>>>>my $seqio_object_in = Bio::SeqIO->new(
>>>>-file => $fasta_file_in,
>>>>-format => 'fasta'
>>>>);
>>>>my $seqio_object_out = Bio::SeqIO->new(
>>>>-file => $fasta_file_out,
>>>>-format => 'fasta'
>>>>);
>>>>
>>>># loop through all the sequences in the fasta file
>>>>while (my $seq_object = $seqio_object_in->next_seq) {
>>>># get the sequence accession for easy matching
>>>>my $seq_acc = $seq_object->accession_number;
>>>>
>>>># write the sequence object to the output fasta file if we have a
>>>>matching accession
>>>>$seqio_object_out->write_seq($seq_object) if exists
>>>>$hash_of_req_acc{$seq_acc};
>>>>}
>>>>-- end script --
>>>>
>>>>I haven't tested this, but it should at least get you started. Also,
>>>>        
>>>>
>the
>  
>
>>>>fasta description line in the output file may not be exactly as it
>>>>        
>>>>
>was
>  
>
>>>>in the input fasta file - if this really matters, you may need to
>>>>        
>>>>
>get
>  
>
>>>>back to us. Also, if the input fasta file is huge (many thousands of
>>>>sequences) it may be wise to create an index of the fasta file in
>>>>        
>>>>
>order
>  
>
>>>>to speed up retrieval.
>>>>
>>>>You may find this page helpful:
>>>>http://www.bioperl.org/wiki/HOWTO:SeqIO
>>>>
>>>>Anyway, hope this helps to get you started.
>>>>Nath
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>=======================================================================
>Attention: The information contained in this message and/or attachments
>from AgResearch Limited is intended only for the persons or entities
>to which it is addressed and may contain confidential and/or privileged
>material. Any review, retransmission, dissemination or other use of, or
>taking of any action in reliance upon, this information by persons or
>entities other than the intended recipients is prohibited by AgResearch
>Limited. If you have received this message in error, please notify the
>sender immediately.
>=======================================================================
>
>  
>