[Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files
Susan Wilson
smwilson at hpc.unm.edu
Tue Aug 14 14:10:53 UTC 2012
Hi,
I am parsing the gb files with biopython. My problem is that none of the
seqfeature.strand values are returning the plus strand (value == 1).
The commands below are a bit fabricated. (For instance, I have left out
the opening and closing of fout.) I have read in
Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. The file
output of command [13] shows only "-1" and "None". Is there a bug in the
parser? Or am I making a mistake of some sort?
Thanks.
Susan
In [10]: genome
Out[10]:
SeqRecord(seq=Seq('NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNN',
Alphabet()), id='1GRCh37', name='1', description='Homo sapiens
chromosome 1 GRCh37 full sequence 1..249250621 reannotated via EnsEMBL',
dbxrefs=[])
In [11]: len(genome)
Out[11]: 249250621
In [12]: len(genome.features)
Out[12]: 109751
In [13]: for f in genome.features:
...: fout.write(str(f.strand) + "~" + str(f.location) + \
...: "~" + str(f.qualifiers.get('gene')) + "\n")
More information about the Biopython
mailing list