[Bioperl-l] reading blast report

Jason Stajich jason at bioperl.org
Thu Jan 14 16:28:29 EST 2010


On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
>
>> What aspects of the report are you loading?  You might consider the  
>> blast
>> report as tab-delimited (-m 8 format) if you only are interested in
>> start/end positions and scores of ailgnments which is a simpler and  
>> reduced
>> dataset that has lower memory footprint by the parser.
>
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
>
>>
>> Searchio (default) -format => blast - you can try the BLAST -format  
>> =>
>> blast_pull instead which lazy parses to create objects and will  
>> reduce
>> memory consumption.
>
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider  
> the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.
>
> thanks,
> -siddhartha

Each result is parsed (1 result per query) and all the hits and HSPs  
are parsed and brought into memory with the standard (non-pull)  
approach.
The SearchIO iterates at the level of result - that is why you call  
next_result which parses each one at a time.

>
>
>>
>> -jason
>> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>>
>>> Hi,
>>> I have a script that reads a tblastn report(13000 records) and  
>>> loads in
>>> a chado database(Bio::Chado::Schema module),  however the machine  
>>> runs of
>>> memory. I am trying to figure
>>> out other than loading the database stuff
>>> if it the reading of SearchIO module could consume a lot of  
>>> memory. So,
>>> when i am reading a blast file and getting the result object ....
>>>
>>> while (my $result = $searchio->next_result)
>>>
>>> * Does the searchio object loads a huge chunk of file in the  
>>> memory or
>>> for each iteration it only reads a part of the result.
>>>
>>> * Does doing an index on blast report and then reading from it be  
>>> much
>>> faster and why. And is there any way i could iterate through each
>>> record in the index,  will that be helpful.
>>>
>>> -siddhartha
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/




More information about the Bioperl-l mailing list