[Biopython] Regarding blast record report
Ahmad Khalifa
underoath006 at gmail.com
Thu Nov 8 16:57:09 UTC 2018
Hello,
I want to extract certain information from the biopython blast output.
In the header I often get variable amounts of information in the title, for
example:
gi|1335041855|gb|PNW76469.1| hypothetical protein CHLRE_11g467616v5
[Chlamydomonas reinhardtii]
gi|159481404|ref|XP_001698769.1| predicted protein [Chlamydomonas
reinhardtii] >gi|745998015|sp|A8JA42.1|IFT56_CHLRE RecName:
Full=Intraflagellar transport protein 56; AltName: Full=Abnormal dye
filling protein 13; AltName: Full=Tetratricopeptide repeat protein 26
homolog; Short=TPR repeat protein 26 homolog
gi|1335043717|gb|PNW78329.1| hypothetical protein CHLRE_09g401700v5
[Chlamydomonas reinhardtii]
I wonder what exactly is contained in this output, what's gi and gb? How
come sometimes I have a refseq or a uniprot accession code but not always
(the same information is not consistently present, very difficult to mine).
Is it possible to retrieve a uniprot accession code for my hits or a gene
name that I can map to an accession code using uniprots API?
What I really want is to mine the title to get every piece of information
separately (if it exists of course), are there parsers that do that?
Best regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20181108/f446e29e/attachment.html>
More information about the Biopython
mailing list