[Biopython] SeqIO.parse for imgt
Liu, Chang
cliu32 at wustl.edu
Fri Nov 4 17:21:12 UTC 2016
Perfect! Thank you so much!!
-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com]
Sent: Friday, November 04, 2016 12:12 PM
To: Liu, Chang <cliu32 at wustl.edu>
Cc: biopython at mailman.open-bio.org
Subject: Re: [Biopython] SeqIO.parse for imgt
On Fri, Nov 4, 2016 at 5:03 PM, Liu, Chang <cliu32 at wustl.edu> wrote:
> ID before 3.16.0 has only three semicolons, which was compatible with
> the 'imgt' parser at the time.
> ID HLA02803 standard; DNA; HUM; 1883 BP.
Yes, the EMBL/IMGT header line has been through various changes.
The current version in the hal.dat file is not one we've seen before though - but looks similar to recent EMBL files but with one field missing.
> One additional question I have is regarding the features:
>...
>
> The features in the original file look like this:
> FT source 1..1883
> FT /organism="Homo sapiens"
> FT /mol_type="genomic DNA"
> FT /db_xref="taxon:9606"
> FT /ethnic="Caucasoid"
> FT /cell_line="QBL"
> FT CDS join(499..570,702..950,1189..1384)
> FT /codon_start=1
> FT /gene="HLA-V"
> FT /allele="HLA-V*01:01:01:03"
> FT /product="MHC Class I HLA-V*01:01:01:03 sequence"
> FT UTR 1..498
> FT exon 499..570
> FT /number="1"
> FT intron 571..701
> FT /number="1"
> FT exon 702..950
> FT /number="2"
> FT intron 951..1188
> FT /number="2"
> FT exon 1189..1384
> FT /number="3"
> FT UTR 1385..1883
>
> My understanding is that the exon number was not captured in the
> features after parsing. Is this correct? The exon numbers is very
> important for downstream applications, because many analysis will need
> to extract exon 2 and 3 for class I HLA genes. If exons are not
> labeled in features, I wouldn't know which exons to keep. Could this
> information be retained after parsing? Thank you for your help!!!
> Chang
It is recorded. Try this for a more detailed output from Biopython:
for f in record.features:
print(f.qualfiers)
In theory there could be multiple entries for each qualifier key, so the dictionary gives you a list. You'd want f.qualifiers["number"][0]
Peter
________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
More information about the Biopython
mailing list