[Bioperl-l] problem with swissprot parsin

Siddhartha Basu basu at pharm.sunysb.edu
Thu Oct 14 16:15:10 EDT 2004


Hi Brian,
Here is the code that started to give the following error. I presume i 
am using Bio::DB::Flat::BDB though i haven't called it directly. I am 
trying to index swissprot/trembl files here.

#!/usr/bin/perl -w
use strict;
use Bio::DB::Flat;

die "no files\n" unless @ARGV;
my $LOCATION = "/home/basu/odbaindex";

my $db = Bio::DB::Flat->new( -directory => $LOCATION,
                                 -dbname => "swissall",
                                 -format => "swiss",
                                 -index => "bdb",
                                 -write_flag => 1,
                              ) or die "can't create BioFlat indexes\n";
$db->build_index(@files);
print "Done indexing\n";

exit;


I get the following warinings.
  ======================================================================
Use of uninitialized value in substitution (s///) at
/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
  18676877.
  Use of uninitialized value in substitution (s///) at
  /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
  18676916.
  Use of uninitialized value in substitution (s///) at
  /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
  18676956.
  Use of uninitialized value in substitution (s///) at
  /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
  18677002.
  Use of uninitialized value in substitution (s///) at
  /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
=========================================================================

I have done a small test with Bio::SeqIO module using a small test 
file(swiss.test). Here is the code.

#!/usr/bin/perl -w
#
use strict;
use Bio::SeqIO;

my $seq = Bio::SeqIO->new(-file => $ARGV[0], -format => "swiss");

while (my $in = $seq->next_seq) {
    print $in->id,"\n";
}

exit;


It gives the same error
Use of uninitialized value in substitution (s///) at 
/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line 28.
1433_CAEEL
Use of uninitialized value in substitution (s///) at 
/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line 87.
A4_CAEEL
Use of uninitialized value in substitution (s///) at 
/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line 171.
AATC_CAEEL

I have also attached the test file.

Hope this will give some clue for the problem.
Thanks for the response.

siddhartha



Brian Osborne wrote:
> Siddhartha,
> 
> Bio::DB::Flat::BinarySearch or Bio::DB::Flat::BDB? Also, please show your
> code when you ask a question, it simplifies matters. For example, it would
> tell me which module you used, which file format, and so on. It also helps
> to attach the actual sequence files, or some smaller test file that shows
> the same error. What happens occasionally is that a question will get
> ignored for the simple reason that no one knows how to answer, there's not
> enough information given in the letter.
> 
> Brian O.
> 
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Siddhartha Basu
> Sent: Thursday, October 14, 2004 2:51 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] problem with swissprot parsin
> 
> Hi,
> I have already described this problem in this mailing list but haven't
> got anybodies attention yet. I had also asked the author of this module
> but nothing back yet. Anyway i really could'nt figure out how to solve
> this and so i am writing again. I have also tried by replacing the
> swiss.pm module from the bioperl-live version. But the problem persists.
> I understand that this is a maintained module and i am not getting
> ignored because of maintenance issue.
> 
> I am trying to make a flat file index of swissprot/trembl files using
> Bio::DB::Flat module. However, i am getting the following consistent
> warnings during the indexing process.
> ======================================================================
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18676877.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18676916.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18676956.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18677002.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18677045.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18677091.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18677136.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18677178.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18677209.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
> 18677249.
> ========================================================================
> Though, the indexing get completed, i could'nt fetch any data from there
>   as it does not return any seq obj.
> I also get the same warnings when i try to read the swissprot file using
> the Bio::SeqIO module.
> I am using bioperl-1.4 and understand it has something to do with the
> swissprot parser in Seq::IO module.
> So, does any fix or solution available for this problem.
> 
> Thanks in  advance.
> 
> -siddhartha
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-------------- next part --------------
ID   1433_CAEEL     STANDARD;      PRT;   248 AA.
AC   P41932; Q21537;
DT   01-NOV-1995 (Rel. 32, Created)
DT   01-NOV-1995 (Rel. 32, Last sequence update)
DT   01-OCT-2004 (Rel. 45, Last annotation update)
DE   14-3-3-like protein 1.
GN   Name=ftt-1; ORFNames=M117.2;
OS   Caenorhabditis elegans.
OC   Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC   Rhabditidae; Peloderinae; Caenorhabditis.
OX   NCBI_TaxID=6239;
RN   [1]
RP   SEQUENCE FROM N.A.
RC   STRAIN=Bristol N2;
RX   MEDLINE=95011616; PubMed=7926802; DOI=10.1016/0378-1119(94)90068-X;
RA   Wang W., Shakes D.C.;
RT   "Isolation and sequence analysis of a Caenorhabditis elegans cDNA
RT   which encodes a 14-3-3 homologue.";
RL   Gene 147:215-218(1994).
RN   [2]
RP   SEQUENCE FROM N.A.
RC   STRAIN=Bristol N2;
RX   MEDLINE=99069613; PubMed=9851916;
RG   THE C. ELEGANS SEQUENCING CONSORTIUM;
RT   "Genome sequence of the nematode C. elegans: a platform for
RT   investigating biology.";
RL   Science 282:2012-2018(1998).
CC   -!- SIMILARITY: Belongs to the 14-3-3 family.
CC   --------------------------------------------------------------------------
CC   This SWISS-PROT entry is copyright. It is produced through a collaboration
CC   between  the Swiss Institute of Bioinformatics  and the  EMBL outstation -
CC   the European Bioinformatics Institute.  There are no  restrictions on  its
CC   use  by  non-profit  institutions as long  as its content  is  in  no  way
CC   modified and this statement is not removed.  Usage  by  and for commercial
CC   entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC   or send an email to license at isb-sib.ch).
CC   --------------------------------------------------------------------------
DR   EMBL; U05038; AAA61872.1; -.
DR   EMBL; Z73910; CAA98138.1; -.
DR   PIR; JC2581; JC2581.
DR   PIR; T23759; T23759.
DR   HSSP; P93343; 1O9E.
DR   IntAct; P41932; -.
DR   WormPep; M117.2; CE06200.
DR   InterPro; IPR000308; 14-3-3.
DR   Pfam; PF00244; 14-3-3; 1.
DR   PRINTS; PR00305; 1433ZETA.
DR   SMART; SM00101; 14_3_3; 1.
DR   PROSITE; PS00796; 1433_1; 1.
DR   PROSITE; PS00797; 1433_2; 1.
KW   Multigene family.
FT   CONFLICT    118    118       A -> V (in Ref. 2).
SQ   SEQUENCE   248 AA;  28162 MW;  B9350039628341AF CRC64;
     MSDTVEELVQ RAKLAEQAER YDDMAAAMKK VTEQGQELSN EERNLLSVAY KNVVGARRSS
     WRVISSIEQK TEGSEKKQQL AKEYRVKVEQ ELNDICQDVL KLLDEFLIVK AGAAESKAFY
     LKMKGDYYRY LAEVASEDRA AVVEKSQKAY QEALDIAKDK MQPTHPIRLG LALNFSVFYY
     EILNTPEHAC QLAKQAFDDA IAELDTLNED SYKDSTLIMQ LLRDNLTLWT SDVGAEDQEQ
     EGNQEAGN
//
ID   A4_CAEEL       STANDARD;      PRT;   686 AA.
AC   Q10651; Q18583; Q95ZX1;
DT   28-FEB-2003 (Rel. 41, Created)
DT   28-FEB-2003 (Rel. 41, Last sequence update)
DT   01-OCT-2004 (Rel. 45, Last annotation update)
DE   Beta-amyloid-like protein precursor.
GN   Name=apl-1; ORFNames=C42D8.8;
OS   Caenorhabditis elegans.
OC   Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC   Rhabditidae; Peloderinae; Caenorhabditis.
OX   NCBI_TaxID=6239;
RN   [1]
RP   SEQUENCE OF 6-686 FROM N.A.
RC   STRAIN=Bristol N2;
RX   MEDLINE=94089766; PubMed=8265668;
RA   Daigle I., Li C.;
RT   "apl-1, a Caenorhabditis elegans gene encoding a protein related to
RT   the human beta-amyloid protein precursor.";
RL   Proc. Natl. Acad. Sci. U.S.A. 90:12045-12049(1993).
RN   [2]
RP   SEQUENCE FROM N.A.
RC   STRAIN=Bristol N2;
RX   MEDLINE=99069613; PubMed=9851916;
RG   THE C. ELEGANS SEQUENCING CONSORTIUM;
RT   "Genome sequence of the nematode C. elegans: a platform for
RT   investigating biology.";
RL   Science 282:2012-2018(1998).
RN   [3]
RP   REVISIONS, AND ALTERNATIVE SPLICING.
RA   Waterston R.;
RL   Submitted (JUN-2001) to the EMBL/GenBank/DDBJ databases.
CC   -!- SUBCELLULAR LOCATION: Type I membrane protein (Potential).
CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=2;
CC       Name=a;
CC         IsoId=Q10651-1; Sequence=Displayed;
CC       Name=b;
CC         IsoId=Q10651-2; Sequence=VSP_000017;
CC         Note=No experimental confirmation available;
CC   -!- SIMILARITY: Belongs to the APP family.
CC   --------------------------------------------------------------------------
CC   This SWISS-PROT entry is copyright. It is produced through a collaboration
CC   between  the Swiss Institute of Bioinformatics  and the  EMBL outstation -
CC   the European Bioinformatics Institute.  There are no  restrictions on  its
CC   use  by  non-profit  institutions as long  as its content  is  in  no  way
CC   modified and this statement is not removed.  Usage  by  and for commercial
CC   entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC   or send an email to license at isb-sib.ch).
CC   --------------------------------------------------------------------------
DR   EMBL; U00240; AAC46470.1; ALT_INIT.
DR   EMBL; U56966; AAA98722.1; -.
DR   EMBL; U56966; AAK68242.1; -.
DR   PIR; T15795; T15795.
DR   HSSP; P05067; 1MWP.
DR   WormPep; C42D8.8a; CE04209.
DR   WormPep; C42D8.8b; CE27845.
DR   InterPro; IPR008155; A4_APP.
DR   InterPro; IPR008154; A4_extra.
DR   Pfam; PF02177; A4_EXTRA; 1.
DR   PRINTS; PR00203; AMYLOIDA4.
DR   SMART; SM00006; A4_EXTRA; 1.
DR   PROSITE; PS00319; A4_EXTRA; 1.
KW   Alternative splicing; Amyloid; Glycoprotein; Neurogenesis; Signal;
KW   Transmembrane.
FT   SIGNAL        1     21       Potential.
FT   CHAIN        22    686       Beta-amyloid-like protein.
FT   DOMAIN       22    621       Extracellular (Potential).
FT   TRANSMEM    622    642       Potential.
FT   DOMAIN      643    686       Cytoplasmic (Potential).
FT   DOMAIN      205    228       Asp-rich.
FT   DOMAIN      676    679       Clathrin-binding (Potential).
FT   CARBOHYD     84     84       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    201    201       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    249    249       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    417    417       N-linked (GlcNAc...) (Potential).
FT   VARSPLIC    538    539       Missing (in isoform b).
FT                                /FTId=VSP_000017.
SQ   SEQUENCE   686 AA;  79434 MW;  A0816858FDD48608 CRC64;
     MTVGKLMIGL LIPILVATVY AEGSPAGSKR HEKFIPMVAF SCGYRNQYMT EEGSWKTDDE
     RYATCFSGKL DILKYCRKAY PSMNITNIVE YSHEVSISDW CREEGSPCKW THSVRPYHCI
     DGEFHSEALQ VPHDCQFSHV NSRDQCNDYQ HWKDEAGKQC KTKKSKGNKD MIVRSFAVLE
     PCALDMFTGV EFVCCPNDQT NKTDVQKTKE DEDDDDDEDD AYEDDYSEES DEKDEEEPSS
     QDPYFKIANW TNEHDDFKKA EMRMDEKHRK KVDKVMKEWG DLETRYNEQK AKDPKGAEKF
     KSQMNARFQK TVSSLEEEHK RMRKEIEAVH EERVQAMLNE KKRDATHDYR QALATHVNKP
     NKHSVLQSLK AYIRAEEKDR MHTLNRYRHL LKADSKEAAA YKPTVIHRLR YIDLRINGTL
     AMLRDFPDLE KYVRPIAVTY WKDYRDEVSP DISVEDSELT PIIHDDEFSK NAKLDVKAPT
     TTAKPVKETD NAKVLPTEAS DSEEEADEYY EDEDDEQVKK TPDMKKKVKV VDIKPKEIKV
     TIEEEKKAPK LVETSVQTDD EDDDEDSSSS TSSESDEDED KNIKELRVDI EPIIDEPASF
     YRHDKLIQSP EVERSASSVF QPYVLASAMF ITAICIIAFA ITNARRRRAM RGFIEVDVYT
     PEERHVAGMQ VNGYENPTYS FFDSKA
//
ID   AATC_CAEEL     STANDARD;      PRT;   408 AA.
AC   Q22067;
DT   01-NOV-1997 (Rel. 35, Created)
DT   01-NOV-1997 (Rel. 35, Last sequence update)
DT   01-OCT-2004 (Rel. 45, Last annotation update)
DE   Probable aspartate aminotransferase, cytoplasmic (EC 2.6.1.1)
DE   (Transaminase A) (Glutamate oxaloacetate transaminase-1).
GN   ORFNames=T01C8.5;
OS   Caenorhabditis elegans.
OC   Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC   Rhabditidae; Peloderinae; Caenorhabditis.
OX   NCBI_TaxID=6239;
RN   [1]
RP   SEQUENCE FROM N.A.
RC   STRAIN=Bristol N2;
RX   MEDLINE=99069613; PubMed=9851916;
RG   THE C. ELEGANS SEQUENCING CONSORTIUM;
RT   "Genome sequence of the nematode C. elegans: a platform for
RT   investigating biology.";
RL   Science 282:2012-2018(1998).
CC   -!- CATALYTIC ACTIVITY: L-aspartate + 2-oxoglutarate = oxaloacetate +
CC       L-glutamate.
CC   -!- COFACTOR: Pyridoxal phosphate (By similarity).
CC   -!- SUBUNIT: Homodimer (By similarity).
CC   -!- SUBCELLULAR LOCATION: Cytoplasmic (Potential).
CC   -!- MISCELLANEOUS: In eukaryotes there are cytoplasmic, mitochondrial
CC       and chloroplastic isozymes.
CC   -!- SIMILARITY: Belongs to the class-I pyridoxal-phosphate-dependent
CC       aminotransferase family.
CC   --------------------------------------------------------------------------
CC   This SWISS-PROT entry is copyright. It is produced through a collaboration
CC   between  the Swiss Institute of Bioinformatics  and the  EMBL outstation -
CC   the European Bioinformatics Institute.  There are no  restrictions on  its
CC   use  by  non-profit  institutions as long  as its content  is  in  no  way
CC   modified and this statement is not removed.  Usage  by  and for commercial
CC   entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC   or send an email to license at isb-sib.ch).
CC   --------------------------------------------------------------------------
DR   EMBL; U58726; AAB00578.1; -.
DR   PIR; T29857; T29857.
DR   HSSP; P00503; 1AJS.
DR   WormPep; T01C8.5; CE07462.
DR   InterPro; IPR004839; Aminotrans_I/II.
DR   InterPro; IPR000796; Asp_trans.
DR   InterPro; IPR004838; NHtransf_1_BS.
DR   Pfam; PF00155; Aminotran_1_2; 1.
DR   PRINTS; PR00799; TRANSAMINASE.
DR   PROSITE; PS00105; AA_TRANSFER_CLASS_1; 1.
KW   Aminotransferase; Pyridoxal phosphate; Transferase.
FT   BINDING     251    251       Pyridoxal phosphate (By similarity).
SQ   SEQUENCE   408 AA;  45493 MW;  A4DDCBCB8C0EFD83 CRC64;
     MSFFDGIPVA PPIEVFHKNK MYLDETAPVK VNLTIGAYRT EEGQPWVLPV VHETEVEIAN
     DTSLNHEYLP VLGHEGFRKA ATELVLGAES PAIKEERSFG VQCLSGTGAL RAGAEFLASV
     CNMKTVYVSN PTWGNHKLVF KKAGFTTVAD YTFWDYDNKR VHIEKFLSDL ESAPEKSVII
     LHGCAHNPTG MDPTQEQWKL VAEVIKRKNL FTFFDIAYQG FASGDPAADA WAIRYFVDQG
     MEMVVSQSFA KNFGLYNERV GNLTVVVNNP AVIAGFQSQM SLVIRANWSN PPAHGARIVH
     KVLTTPARRE QWNQSIQAMS SRIKQMRAAL LRHLMDLGTP GTWDHIIQQI GMFSYTGLTS
     AQVDHLIANH KVFLLRDGRI NICGLNTKNV EYVAKAIDET VRAVKSNI
//




More information about the Bioperl-l mailing list