[Biojava-dev] [Bug 2687] New: UniProt: Tags in feature continuation lines may be lost.

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Nov 25 23:31:57 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2687

           Summary: UniProt: Tags in feature continuation lines may be lost.
           Product: BioJava
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: major
          Priority: P2
         Component: seq.io
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: jan at biochemfusion.com


Using BioJava 1.6.1 on Windows XP to read in a UniProt file and write it back
out. Main code that does this:

        BufferedReader br = new BufferedReader(new FileReader(args[0]));
        SimpleNamespace ns = new SimpleNamespace("biojava");

        RichSequenceIterator rsi = RichSequence.IOTools.readUniProt(br, ns);
        RichSequence rs = rsi.nextRichSequence();
        RichSequence.IOTools.writeUniProt(System.out, rs, ns);

When reading in this heavily abridged version of FA9_BOVIN from www.uniprot.org
it works:

ID   FA9_BOVIN               Reviewed;         416 AA.
AC   P00741;
FT   CHAIN         1    416       Coagulation factor IX.
FT   CARBOHYD     53     53       O-linked (Glc...).
FT                                /FTId=CAR_000008.
FT                                Extra information.
SQ   SEQUENCE   416 AA;  46785 MW;  34A7DFE916330662 CRC64;
     YNSGKLEEFV RGNLERECKE EKCSFEEARE VFENTEKTTE FWKQYVDGDQ CESNPCLNGG
     MCKDDINSYE CWCQAGFEGT NCELDATCSI KNGRCKQFCK RDTDNKVVCS CTDGYRLAED
     QKSCEPAVPF PCGRVSVSHI SKKLTRAETI FSNTNYENSS EAEIIWDNVT QSNQSFDEFS
     RVVGGEDAER GQFPWQVLLH GEIAAFCGGS IVNEKWVVTA AHCIKPGVKI TVVAGEHNTE
     KPEPTEQKRN VIRAIPYHSY NASINKYSHD IALLELDEPL ELNSYVTPIC IADRDYTNIF
     SKFGYGYVSG WGKVFNRGRS ASILQYLKVP LVDRATCLRS TKFSIYSHMF CAGYHEGGKD
     SCQGDSGGPH VTEVEGTSFL TGIISWGEEC AMKGKYGIYT KVSRYVNWIK EKTKLT
//

However, when the extra information has been tagged with a slash it is lost:

ID   FA9_BOVIN               Reviewed;         416 AA.
AC   P00741;
FT   CHAIN         1    416       Coagulation factor IX.
FT   CARBOHYD     53     53       O-linked (Glc...).
FT                                /FTId=CAR_000008.
FT                                /NB=Extra information.
SQ   SEQUENCE   416 AA;  46785 MW;  34A7DFE916330662 CRC64;
     YNSGKLEEFV RGNLERECKE EKCSFEEARE VFENTEKTTE FWKQYVDGDQ CESNPCLNGG
     MCKDDINSYE CWCQAGFEGT NCELDATCSI KNGRCKQFCK RDTDNKVVCS CTDGYRLAED
     QKSCEPAVPF PCGRVSVSHI SKKLTRAETI FSNTNYENSS EAEIIWDNVT QSNQSFDEFS
     RVVGGEDAER GQFPWQVLLH GEIAAFCGGS IVNEKWVVTA AHCIKPGVKI TVVAGEHNTE
     KPEPTEQKRN VIRAIPYHSY NASINKYSHD IALLELDEPL ELNSYVTPIC IADRDYTNIF
     SKFGYGYVSG WGKVFNRGRS ASILQYLKVP LVDRATCLRS TKFSIYSHMF CAGYHEGGKD
     SCQGDSGGPH VTEVEGTSFL TGIISWGEEC AMKGKYGIYT KVSRYVNWIK EKTKLT
//

BioPerl 1.5.2 copes with both situtations without problems.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the biojava-dev mailing list