[Biopython] Parsing FASTA headers

Sheng Wang bsmagic at qq.com
Mon Nov 14 10:40:08 UTC 2016


Hello Alexey:
Maybe you could overload the object?


------------------ Original ------------------
From:  "Alexey Morozov";<alexeymorozov1991 at gmail.com>;
Date:  Tue, Aug 23, 2016 11:13 AM
To:  "biopython"<biopython at mailman.open-bio.org>;
Subject:  [Biopython] Parsing FASTA headers

Hello everyone.
Is any support for FASTA dialects, so to say, in Biopython? For example, NCBI headers include GI/new ID, human-readable sequence name, and a good deal of them include species name in square brackets. Ones on JGI site include two of their sequence IDs and a shortened species name. MMETSP consists of lots and lots of tags. And so on and so forth, most databases have some internal standart for FASTA headers that potentially includes useful information.
Looking up docs, I found only SeqRecord.id and SeqRecord.description. If I understood correctly, this just means "Stuff before or after first \s, respectively". Can I get more fine-grained features without cooking up my own parser?


-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.



More information about the Biopython mailing list