[Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use]

Tue Nov 10 10:43:40 UTC 2009

Hello again,

I tried what Mark told me modifying the code line he told me but there´s
still a problem that I believe must be due to the sequences name.
My secuences header on the Fasta file have this format:

>PleosPC9_1_103820|fgenesh1_pg.3_#_1

Th part on the right of the pipe changes depending on the program used to
create the gene model, for example:

>PleosPC9_1_103820|fgenesh1_pg.3_#_1
>PleosPC9_1_123413|genemark.2731_g
>PleosPC9_1_52065|e_gw1.3.64.1

So I guess I need to parse my ids somehow for thr program to detect only
the first part of the fasta header (the "protein name") and not to get
messed with the other side of the pipe...

This is the corrected code I wrote following Mark´s indications, but I
still don´t have any idea about the parsing issue...

#!/c:/Perl -w
use Bio::Index::Fasta;
use strict;
#PC9.fasta is my genomic file
my $Index_File_Name ="PC9.fasta";
my $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
#LCS.txt is my sequences list
@ARGV = <LCS.txt>;
foreach  my $id (@ARGV) {
if ($id eq ''){
die ("empty list")
}
else {
my $seqobj = $inx->fetch($id);
my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta",
-format => 'fasta');
$out->write_seq($seqobj);
}
}
exit;
}

Thanks in advance

PD. May it be a faster way of extracting those sequences using plain PERL?

El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribió:
> Yes, these are files created by the SDBM, Perl's internal db manager. You
> should
> be able to
> open the index by simply
> $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
> and the dbm will know what to do--
> cheers MAJ
> ----- Original Message -----
> From: <jluis.lavin at unavarra.es>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <jluis.lavin at unavarra.es>; <bioperl-l at lists.open-bio.org>
> Sent: Thursday, November 05, 2009 11:21 AM
> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its
> correct
> use]
>
>
>> Thank you very much Mark, that´s a good point :$
>> I guess your correction is referred to the second script, isn´t it?
>>
>> If it is so, there is still a problem with the first script, it doesn´t
>> create the PC9.fasta.idx file, instead it creates two files named:
>> -PC9.fasta.idx.pag
>> -PC9.fasta.idx.dir
>>
>> which seem to be clearly related with some kind of indexing
>> process...but,
>> unless the PC9.fasta.idx file is only virtual or remains hidden, I can´t
>> find it anywhere...
>> Forgive me if I´m talking nosense...
>>
>> Thank you very much again for your help ;)
>>
>>
>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribió:
>>> Hey José,
>>> The first thing that jumps out it the index file name. Looks
>>> like you create it as
>>> PC9.fasta.idx
>>> But you read it as
>>> PC9.fasta
>>> Not an unusual mistake. Do
>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
>>> and see if it works.
>>> MAJ
>>> ----- Original Message -----
>>> From: <jluis.lavin at unavarra.es>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Thursday, November 05, 2009 10:46 AM
>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its
>>> correct
>>> use]
>>>
>>>
>>>
>>>
>>> ---------------------------- Mensaje original
>>> ----------------------------
>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct
>>> use
>>> From:    jluis.lavin at unavarra.es
>>> Fecha:   Jue, 5 de Noviembre de 2009, 16:46
>>> To:      "Mark A. Jensen" <maj at fortinbras.us>
>>> --------------------------------------------------------------------------
>>>
>>> Hi Mark,
>>>
>>> I´ve actually got two scripts, the first one is to create the index and
>>> the second one is to retrieve the sequence lis from the indexed file.
>>>
>>> 1)Here is the Index creation script:
>>>
>>> #!/c:/Perl -w
>>> use strict;
>>> use Bio::Index::Fasta;
>>> use strict;
>>>
>>> print "Enter file for indexing: \n";
>>> my $Index_File_Name = <STDIN>;
>>> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx",
>>>     -write_flag => 1);
>>> $inx->make_index(my $File_Name);
>>>
>>> 2)And here is the sequence retrieval script:
>>>
>>> #!/c:/Perl -w
>>> use Bio::Index::Fasta;
>>> use strict;
>>> #PC9.fasta is my genomic file
>>> my $Index_File_Name ="PC9.fasta";
>>> my $inx = Bio::Index::Fasta->new($Index_File_Name);
>>> #LCS.txt is my sequences list
>>> @ARGV = <lCS.txt>;
>>> foreach  my $id (@ARGV) {
>>> if ($id eq ''){
>>> die ("empty list")
>>> }
>>> else {
>>> my $seqobj = $inx->fetch($id);
>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta",
>>> -format => 'fasta');
>>> $out->write_seq($seqobj);
>>> }
>>> }
>>> exit;
>>> }
>>>
>>> I hope this code is not a total scum...
>>>
>>> Thanks in advance ;)
>>>
>>>
>>>
>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribió:
>>>> José -- It looks like this is a good solution to your problem. Please
>>>> send
>>>> you
>>>> script so we can look at it-
>>>> cheers Mark
>>>> ----- Original Message -----
>>>> From: <jluis.lavin at unavarra.es>
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Thursday, November 05, 2009 10:28 AM
>>>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use
>>>>
>>>>
>>>>
>>>> Hello to all,
>>>>
>>>> I´m trying to write a script to retrieve a list of sequences from a
>>>> local
>>>> FASTA file (for example a fasta archive where all the protein models
>>>> of
>>>> an
>>>> organism are stored). This file would be used by me as some kind
>>>> "local
>>>> database" (sorry if I mistake a few concepts...)
>>>> I´ve been reading the BioPerl HOWTOs and I came across the
>>>> Bio::Index::Fasta tool.
>>>> If I didn´t misunderstood what I read (which can be easy because my
>>>> low
>>>> level on programming) this Indexing tool should do the job.
>>>> I wrote a couple of scripts based on the documentation i read about
>>>> this
>>>> tool, but I don´t seem to be able to create the index file to be used
>>>> later (to retrieve the sequences from).
>>>> -First of all, I want to ask the people in this forum if the
>>>> Bio::Index::Fasta is the right one to chose for this tasks.
>>>> -Then I´ll beg you to take a look at my scripts, because I don´t seem
>>>> to
>>>> catch the bug...
>>>>
>>>> Best wishes to you all and thanks in advance ;)
>>>>
>>>> --
>>>> José Luis Lavín Trueba, PhD
>>>>
>>>> Dpto. de Producción Agraria
>>>> Grupo de Genética y Microbiología
>>>> Universidad Pública de Navarra
>>>> 31006 Pamplona
>>>> Navarra
>>>> SPAIN
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dr. José Luis Lavín Trueba
>>>
>>> Dpto. de Producción Agraria
>>> Grupo de Genética y Microbiología
>>> Universidad Pública de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>>
>>>
>>>
>>> --
>>> Dr. José Luis Lavín Trueba
>>>
>>> Dpto. de Producción Agraria
>>> Grupo de Genética y Microbiología
>>> Universidad Pública de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>>
>> --
>> Dr. José Luis Lavín Trueba
>>
>> Dpto. de Producción Agraria
>> Grupo de Genética y Microbiología
>> Universidad Pública de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
Dr. José Luis Lavín Trueba

Dpto. de Producción Agraria
Grupo de Genética y Microbiología
Universidad Pública de Navarra
31006 Pamplona
Navarra
SPAIN