[Biopython] Parsing FASTA headers

Alexey Morozov alexeymorozov1991 at gmail.com
Tue Aug 23 03:13:21 UTC 2016


Hello everyone.
Is any support for FASTA dialects, so to say, in Biopython? For example,
NCBI headers include GI/new ID, human-readable sequence name, and a good
deal of them include species name in square brackets. Ones on JGI site
include two of their sequence IDs and a shortened species name. MMETSP
consists of lots and lots of tags. And so on and so forth, most databases
have some internal standart for FASTA headers that potentially includes
useful information.
Looking up docs, I found only SeqRecord.id and SeqRecord.description. If I
understood correctly, this just means "Stuff before or after first \s,
respectively". Can I get more fine-grained features without cooking up my
own parser?


-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20160823/3a691b77/attachment.html>


More information about the Biopython mailing list