[Biopython] Parsing FASTA headers

Alexey Morozov alexeymorozov1991 at gmail.com
Mon Nov 14 10:52:47 UTC 2016


I eventually just wrote a simple function that took SeqRecord, parsed the
header and returned the new SeqRecord with annotations set. I just hoped
someone has already built a general-purpose solution.

2016-11-14 18:40 GMT+08:00 Sheng Wang <bsmagic at qq.com>:

> Hello Alexey:
> Maybe you could overload the object?
>
>
> ------------------ Original ------------------
> From:  "Alexey Morozov";<alexeymorozov1991 at gmail.com>;
> Date:  Tue, Aug 23, 2016 11:13 AM
> To:  "biopython"<biopython at mailman.open-bio.org>;
> Subject:  [Biopython] Parsing FASTA headers
>
> Hello everyone.
> Is any support for FASTA dialects, so to say, in Biopython? For example,
> NCBI headers include GI/new ID, human-readable sequence name, and a good
> deal of them include species name in square brackets. Ones on JGI site
> include two of their sequence IDs and a shortened species name. MMETSP
> consists of lots and lots of tags. And so on and so forth, most databases
> have some internal standart for FASTA headers that potentially includes
> useful information.
> Looking up docs, I found only SeqRecord.id and SeqRecord.description. If I
> understood correctly, this just means "Stuff before or after first \s,
> respectively". Can I get more fine-grained features without cooking up my
> own parser?
>
>
> --
> Alexey Morozov,
> LIN SB RAS, bioinformatics group.
> Irkutsk, Russia.




-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20161114/4f252b14/attachment.html>


More information about the Biopython mailing list