[BioPython] help with retrieving seq
Dinakar
Desai.Dinakar@mayo.edu
Thu, 01 Mar 2001 20:22:14 -0600
This is a multi-part message in MIME format.
--------------393657E983AC3F20A6B359AE
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Brad and others:
Thank you very much. Today I tried to create a dictionary with est_id and
file marker i.e the position of the file in the database (may not be best
solution) and it took about 1 hour on our Beowolf(Sp????) cluster with 4 gig
memory (most of the time was spent on reading the file,I guess). I used
Brad's example of fasta parser to create dictionary. Dictionary is about 60
MB(est_id (key) and file marker as value). It takes about 4 min to load the
file and look for key and seek the location of file in database and retrieve
sequence (i tried sequence at the end of file). I used cPickle to load the
file. There must be better algorithm to search for such a big file. Friend of
mine suggested to use database to store key and file location. Someone else
suggested to use GDBM (from gnu) to look for better solution. Does anyone
else have better solution than what I am doing now( I am sure there are
better solutions).
I hope to hear from you soon.
Thank you.
Dinakar
Brad Chapman wrote:
> Hi Dinakar;
>
> [Finding records in FASTA files]
> > It works well for the sequences that are closer to start of file but if
> > the sequence is towards the end, it takes almost forever ( i mean it is
> > slow).
>
> Yup, definately true -- if you have really big files, this probably
> isn't the best approach.
>
> > Is there any indexing technique. I was thinking, I should create
> > some sort of index because I will be doing this quite often and that way
> > search can be really fast. Or is there any efficient method of searching
> > EST database. Does any one has any suggestion regarding indexing.
>
> You probably want to check out the next section in the Tutorial:
>
> 2.4.4. FASTA files as Dictionaries
>
> The example there is actually of indexing a FASTA file using accession
> numbers. This sounds really close to what you need. Let us know if you
> have problems modifying the example to fit in your actual case. BTW,
> the example code is in Doc/examples/fasta_dictionary.py if you want to
> start from that.
>
> Hope this helps,
> Brad
--------------393657E983AC3F20A6B359AE
Content-Type: text/x-vcard; charset=us-ascii;
name="desai.dinakar.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Dinakar
Content-Disposition: attachment;
filename="desai.dinakar.vcf"
begin:vcard
n:Desai;Dinakar
tel;work:507-266-2831
x-mozilla-html:FALSE
org:Mayo Foundation;Clinical Trials Section
adr:;;;Rochester ;MN;55905;USA
version:2.1
email;internet:desai.dinakar@mayo.edu
x-mozilla-cpt:;0
fn:Dinakar Desai
end:vcard
--------------393657E983AC3F20A6B359AE--