[Biopython] Bio.GenBank .Scanner ?
Peter
biopython at maubp.freeserve.co.uk
Tue Oct 12 08:46:13 UTC 2010
On Mon, Oct 11, 2010 at 11:30 PM, Ara Kooser <akooser at unm.edu> wrote:
> Hello all,
>
> I found a partial answer to my question. I've download all the GenBank
> files for Strep. sp. AA4. I am using SeqIO to look at the information in the
> files. The documentation recommends using SeqIO. I am searching for the tag
> that will only extract:
> CDS 1..5256
> /locus_tag="StAA4_010100030484"
> /coded_by="complement(NZ_ACEV01000078.1:25146..40916)"
> /note="COG3321 Polyketide synthase modules and related
> proteins"
> /transl_table=11
> /db_xref="CDD:33130"
>
> this /coded_by="complement(NZ_ACEV01000078.1:25146..40916)" line from the
> GenBank files.
> The api documentation on-line discusses the parse_feature which is what I
> think I need. I am not sure the best way to pull out that one line.
I would not recommend usingBio.GenBank.Scanner directly for this task.
If you did want to do this, you would create your own consumer class
(probably as a subclass of BaseGenBankConsumer) and use this with
the GenBankScanner object. Your consumer would ignore most of the
parsing events, and focus on the CDS coded_by qualifier information.
> My current code is:
> from Bio import SeqIO
> gb_file = "sequences.gp"
> for gb_record in SeqIO.parse(open(gb_file,"r"), "genbank"):
> gb_feature = gb_record.features[2]
> print gb_feature
>
>
> Thank you for your time and help.
> Ara
Try something along these lines:
from Bio import SeqIO
gb_file = "sequences.gp"
for gb_record in SeqIO.parse(open(gb_file,"r"), "genbank"):
for gb_feature in gb_record.features:
if gb_feature.type != "CDS": continue
print gb_feature.qualifiers
Now you will need some way to identify *which* of the potentially
many CDS features present in the GenBank file is the one you
care about. I would guess you got StAA4_010100030484 from
the BLAST hits, so you should filter on the locus_tag qualifier.
There is a related example here,
http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features
Peter
More information about the Biopython
mailing list