[Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails

Chris Fields cjfields at uiuc.edu
Wed Sep 6 12:31:01 UTC 2006


Daniel,

Could you add a bug report to Bugzilla that describes the problems  
(conversion to STANDARD)?

Attach an example input file with PRELIMINARY and an example output  
file which ends up as STANDARD.

The lack of warning with no retrieval is odd but not unheard of.   
Specifically, it may be something wrong with biofetch's way of  
retrieving remote data (not Bio::Registry).  We can probably add a  
check for that.

Chris

On Sep 6, 2006, at 4:11 AM, Daniel Lang wrote:

> Hi Brian,
>
> I'm iterating now over all uniprot_trembl sequences and record for  
> which
>  retrieval fails - Lets see if STANDARDs also fail...
>
> How is the second field of the swissprot ID line handled anyway?  
> Because
> PRELIMINARYs end up as STANDARD when being parsed by  
> Bio::SeqIO::swiss.
>
> On the other side I'm still confused why there's no error or warning
> when the retrieval fails. Can you give me a hint which modules  
> (besides
> swiss.pm) to look at?
>
> Cheers,
> Daniel
>
> Brian Osborne wrote:
>> Daniel,
>>
>> Well, if you can isolate the bug please add it to bugzilla.
>>
>> Brian O.
>>
>>
>> On 9/5/06 5:57 AM, "Daniel Lang" <daniel.lang at biologie.uni- 
>> freiburg.de>
>> wrote:
>>
>>> Hi Brian,
>>>
>>> sorry for the belated response!
>>> I've compiled you a set of 100 PRELIMINARY entries from the latest
>>> uniprot_trembl release. I've tried to reproduce the bug using  
>>> only these
>>> as input to build an index, but (sadly) all of them can be retrieved
>>> using the latest checkout:-(
>>> Maybe its not connected to these entries after all, but the size  
>>> or some
>>> other feature of the uniprot distribution?
>>> I now could make it work using the 1.5.1 release.
>>>
>>> Originally, I've built the index using flat protocol, when I try  
>>> bdb and
>>> bioperl-live even more problems occur:
>>>
>>> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c  
>>> uniprot_sprot.dat
>>>
>>> ------------- EXCEPTION  -------------
>>> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata,  
>>> Vertebrata,
>>> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea,
>>> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had  
>>> two
>>> non-consecutive nodes with the same name. Can't cope!
>>> STACK Bio::DB::Taxonomy::list::add_lineage
>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163
>>> STACK Bio::DB::Taxonomy::list::new
>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100
>>> STACK Bio::DB::Taxonomy::new
>>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106
>>> STACK Bio::Species::classification
>>> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171
>>> STACK Bio::SeqIO::swiss::_read_swissprot_Species
>>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049
>>> STACK Bio::SeqIO::swiss::next_seq
>>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240
>>> STACK Bio::DB::Flat::parse_one_record
>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333
>>> STACK Bio::DB::Flat::BDB::_index_file
>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235
>>> STACK Bio::DB::Flat::BDB::build_index
>>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218
>>> STACK toplevel
>>> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl: 
>>> 113
>>>
>>> But I think this is connected to the new changes to taxonomy  
>>> handling in
>>> Bio::Taxon...
>>> I'm unsure wether to submit this separately, but I could also  
>>> provide an
>>> example of such a swissprot entry that causes this error.
>>>
>>> Thanks, again.
>>>
>>> Daniel
>>>
>>> Brian Osborne wrote:
>>>> Daniel,
>>>>
>>>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file  
>>>> with such a
>>>> PRELIMINARY entry?
>>>>
>>>> Brian O.
>>>>
>>>>
>>>> On 9/1/06 6:11 AM, "Daniel Lang" <daniel.lang at biologie.uni- 
>>>> freiburg.de>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> when using Bio::Registry (bioperl-live) to fetch uniprot  
>>>>> entries from
>>>>> local indexed uniprot *.dats, I had to realize that several  
>>>>> entries
>>>>> could not be retrieved despite the fact that they are present  
>>>>> in the
>>>>> files! A closer look reveals that they are of status PRELIMINARY:
>>>>>
>>>>> uniprot_trembl.dat:ID   Q16EZ1_AEDAE   PRELIMINARY;   PRT;    
>>>>> 222 AA.
>>>>>
>>>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout..
>>>>> I also can't retrieve the sequences from the online database  
>>>>> defined as
>>>>> follows:
>>>>> [swissprot_ebi]
>>>>> protocol=biofetch
>>>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch
>>>>> dbname=swall
>>>>>
>>>>> Is this a bug or a feature? If its a feature, how can I bypass it?
>>>>>
>>>>> Thanks in advance,
>>>>> Daniel
>>>>
>>>
>>>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list