[Biopython-dev] [Bug 2829] New: Biosequence.alphabet can be set to unknown after loading a nucleotide SeqRecord

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri May 15 20:24:29 EDT 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2829

           Summary: Biosequence.alphabet can be set to unknown after loading
                    a nucleotide SeqRecord
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: david.wyllie at ndm.ox.ac.uk


Hi

I have done the following
1 loaded a small nucleotide fasta file with SeqIO, setting the alphabet
successfully 
2 written it to a test database with BioSQL
3 reloaded it, at which point the reloaded object has a "SingleLetterAlphabet"
alphabet and biosequence.alphabet is set to unknown.

Is this expected?

The overall object was to add some SeqFeatures to the loaded SeqRecord, but it
doesn't seem to store correctly even without any manipulations.

Below demonstrates the problem. The system is Ubuntu 9 x64/ Python 2.6/
Biopython 1.49.

#!/usr/bin/env python

from BioSQL import BioSeqDatabase
from Bio.Alphabet import generic_nucleotide
from Bio import SeqIO
from Bio import Seq

# define variables needed for testing
username="myusername"
password="mypassword"
hostname="localhost"

# we are going to try to load a nucleotide fasta file into a BioSQL database
# need a test file, with inputfile the file name;
#>test_sequence
#ttgaccgatgaccccggttcaggcttcaccacagtgtggaacgcggtcgtctccgaactt
inputfile="/home/dwyllie/test.faa"

# we want to create a new BioSQL database, called test
dbname="test"
dbdescription="test of alphabet storage"

# we also want to remove one if it exists, for the purposes of testing
server = BioSeqDatabase.open_database(driver="MySQLdb", db="bioseqdb",
user=username, passwd=password, host=hostname)   
# if the database doesn't exist, we get an error, so we trap for that
try:
        server.remove_database(dbname)
        server.adaptor.commit()
except KeyError:
        print "Attempt to remove ",dbname," failed; going on to create a new
one" 

server = BioSeqDatabase.open_database(driver="MySQLdb", db="bioseqdb",
user=username, passwd=password, host=hostname)   
db = server.new_database(dbname, description=dbdescription)
server.adaptor.commit()

# set up a list to hold the mycobacterial sequences
selectedrecords = [] # Setup an empty list which we'll later write

# ifh is the input file handle;
ifh = open(inputfile, "rU")

# set a counter
recordsread=0

for record in SeqIO.parse(ifh, "fasta", generic_nucleotide):

        # increment counter
        recordsread=recordsread+1

        # just so we can reload it easily, we'll assign an id to this record
        # however, the problem does not depend on this,
        # nor on the nature of the defline, as far as I can tell
        record.id="IDENTIFIER_"+str(recordsread)

        print "** Note the sequence type of the Seq ** "
        print record

        # note that to this point it does appear to work, and the alphabet is
correct.
        selectedrecords.append(record)

print inputfile, "total found ", recordsread
ifh.close()

# write it to the bioSQL database
print "Writing sequences to database"
db.load(selectedrecords)
server.adaptor.commit()

# subsequent attempts to write the re-loaded object fail because no alphabet is
defined
print "However, the alphabet hasn't been stored."
loadedrecord=db.lookup(gi="IDENTIFIER_1")
print "Displaying re-loaded record"
print loadedrecord

# this can be confirmed by running
sqlcmd="""
select * from 
bioseqdb.biosequence,
bioseqdb.bioentry, 
bioseqdb.biodatabase
where 
biodatabase.biodatabase_id= bioentry.biodatabase_id  and
biosequence.bioentry_id=bioentry.bioentry_id and
biodatabase.name="test"

"""

print "This can be confirmed by examining bioseqdb.biosequence.alphabet, which
is set to unknown; ", sqlcmd


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list