[Biopython-dev] [Bug 2907] New: When a genomic record has been loaded using eFetch, if it is written to genbank format the header line refers to 'aa'
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Mon Aug 24 21:34:02 EDT 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2907
Summary: When a genomic record has been loaded using eFetch, if
it is written to genbank format the header line refers
to 'aa'
Product: Biopython
Version: 1.51b
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: david.wyllie at ndm.ox.ac.uk
When a genomic record has been loaded using eFetch, if it is written to genbank
format the header line refers to 'aa' not 'bp' although the .seq.alphabet is
set (correctly, I think) to generic_dna.
The background here is that we're annotating some viral genomes computationally
(however, the annotation isn't necessary for the problem here, see below) and
then writing the output to .gb format. After this we load the file using
LaserGene (a commercial sequence editing program) to have a look at it etc.
This doesn't work terribly well because of the 'aa' designation in the header
line. Apart from this, the export seems ok.
I'm using a git download from mid-June 09.
here is an example which illustrates this:
# load dependencies
from Bio import Entrez
from Bio import SeqIO
from Bio import SeqRecord
from Bio.Alphabet import generic_protein, generic_dna
# get a sequence from Genbank
print "going to recover a sequence from genbank...."
ifh = Entrez.efetch(db="nucleotide",id="DQ923122",rettype="gb")
# parse the file handle
recordlist=[]
print "OK, got the records from genbank, parsing ..."
for record in SeqIO.parse(ifh, "genbank"):
recordlist.append(record)
ifh.close()
# write it to a file
for thisrecord in recordlist:
# confirm it's dna
assert (type(thisrecord.seq.alphabet)==type(generic_dna)), "We are
supposed to be dealing with a DNA sequence, but we aren't, can't continue."
# write to gb
ofn=thisrecord.id+".gb"
print "Writing thisrecord to ",ofn
ofh=open(ofn,"w")
SeqIO.write([thisrecord], ofh, "gb")
ofh.close
exit()
# top lines of the genbank file reads as follows
#
#LOCUS DQ923122 34250 aa DNA VRL
01-JAN-1980
#DEFINITION Human adenovirus 52 isolate T03-2244, complete genome.
#ACCESSION DQ923122
#VERSION DQ923122.2 GI:124375632
#KEYWORDS
#SOURCE Human adenovirus 52
# ORGANISM Human adenovirus 52
# Viruses; dsDNA viruses, no RNA stage; Adenoviridae;
Mastadenovirus;
# unclassified Human adenoviruses
#FEATURES Location/Qualifiers
# source 1..34250
# /country="USA"
# /isolate="T03-2244"
# /mol_type="genomic DNA"
# /organism="Human adenovirus 52"
# /db_xref="taxon:332179
Thank you for any advice you have to offer.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list