[Biopython-dev] [Bug 2829] New: Biosequence.alphabet can be set to unknown after loading a nucleotide SeqRecord
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri May 15 20:24:29 EDT 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2829
Summary: Biosequence.alphabet can be set to unknown after loading
a nucleotide SeqRecord
Product: Biopython
Version: 1.49
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: BioSQL
AssignedTo: biopython-dev at biopython.org
ReportedBy: david.wyllie at ndm.ox.ac.uk
Hi
I have done the following
1 loaded a small nucleotide fasta file with SeqIO, setting the alphabet
successfully
2 written it to a test database with BioSQL
3 reloaded it, at which point the reloaded object has a "SingleLetterAlphabet"
alphabet and biosequence.alphabet is set to unknown.
Is this expected?
The overall object was to add some SeqFeatures to the loaded SeqRecord, but it
doesn't seem to store correctly even without any manipulations.
Below demonstrates the problem. The system is Ubuntu 9 x64/ Python 2.6/
Biopython 1.49.
#!/usr/bin/env python
from BioSQL import BioSeqDatabase
from Bio.Alphabet import generic_nucleotide
from Bio import SeqIO
from Bio import Seq
# define variables needed for testing
username="myusername"
password="mypassword"
hostname="localhost"
# we are going to try to load a nucleotide fasta file into a BioSQL database
# need a test file, with inputfile the file name;
#>test_sequence
#ttgaccgatgaccccggttcaggcttcaccacagtgtggaacgcggtcgtctccgaactt
inputfile="/home/dwyllie/test.faa"
# we want to create a new BioSQL database, called test
dbname="test"
dbdescription="test of alphabet storage"
# we also want to remove one if it exists, for the purposes of testing
server = BioSeqDatabase.open_database(driver="MySQLdb", db="bioseqdb",
user=username, passwd=password, host=hostname)
# if the database doesn't exist, we get an error, so we trap for that
try:
server.remove_database(dbname)
server.adaptor.commit()
except KeyError:
print "Attempt to remove ",dbname," failed; going on to create a new
one"
server = BioSeqDatabase.open_database(driver="MySQLdb", db="bioseqdb",
user=username, passwd=password, host=hostname)
db = server.new_database(dbname, description=dbdescription)
server.adaptor.commit()
# set up a list to hold the mycobacterial sequences
selectedrecords = [] # Setup an empty list which we'll later write
# ifh is the input file handle;
ifh = open(inputfile, "rU")
# set a counter
recordsread=0
for record in SeqIO.parse(ifh, "fasta", generic_nucleotide):
# increment counter
recordsread=recordsread+1
# just so we can reload it easily, we'll assign an id to this record
# however, the problem does not depend on this,
# nor on the nature of the defline, as far as I can tell
record.id="IDENTIFIER_"+str(recordsread)
print "** Note the sequence type of the Seq ** "
print record
# note that to this point it does appear to work, and the alphabet is
correct.
selectedrecords.append(record)
print inputfile, "total found ", recordsread
ifh.close()
# write it to the bioSQL database
print "Writing sequences to database"
db.load(selectedrecords)
server.adaptor.commit()
# subsequent attempts to write the re-loaded object fail because no alphabet is
defined
print "However, the alphabet hasn't been stored."
loadedrecord=db.lookup(gi="IDENTIFIER_1")
print "Displaying re-loaded record"
print loadedrecord
# this can be confirmed by running
sqlcmd="""
select * from
bioseqdb.biosequence,
bioseqdb.bioentry,
bioseqdb.biodatabase
where
biodatabase.biodatabase_id= bioentry.biodatabase_id and
biosequence.bioentry_id=bioentry.bioentry_id and
biodatabase.name="test"
"""
print "This can be confirmed by examining bioseqdb.biosequence.alphabet, which
is set to unknown; ", sqlcmd
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list