[Biopython] Parsing FASTA headers

Peter Cock p.j.a.cock at googlemail.com
Mon Nov 14 13:21:06 UTC 2016


That's what I'd do too, tailored to match where ever the
FASTA file came from, because every provider does
their own thing - or multiple variants of it.

Peter

On Mon, Nov 14, 2016 at 10:52 AM, Alexey Morozov
<alexeymorozov1991 at gmail.com> wrote:
> I eventually just wrote a simple function that took SeqRecord, parsed the
> header and returned the new SeqRecord with annotations set. I just hoped
> someone has already built a general-purpose solution.
>
> 2016-11-14 18:40 GMT+08:00 Sheng Wang <bsmagic at qq.com>:
>>
>> Hello Alexey:
>> Maybe you could overload the object?
>>
>>
>> ------------------ Original ------------------
>> From:  "Alexey Morozov";<alexeymorozov1991 at gmail.com>;
>> Date:  Tue, Aug 23, 2016 11:13 AM
>> To:  "biopython"<biopython at mailman.open-bio.org>;
>> Subject:  [Biopython] Parsing FASTA headers
>>
>> Hello everyone.
>> Is any support for FASTA dialects, so to say, in Biopython? For example,
>> NCBI headers include GI/new ID, human-readable sequence name, and a good
>> deal of them include species name in square brackets. Ones on JGI site
>> include two of their sequence IDs and a shortened species name. MMETSP
>> consists of lots and lots of tags. And so on and so forth, most databases
>> have some internal standart for FASTA headers that potentially includes
>> useful information.
>> Looking up docs, I found only SeqRecord.id and SeqRecord.description. If I
>> understood correctly, this just means "Stuff before or after first \s,
>> respectively". Can I get more fine-grained features without cooking up my
>> own parser?
>>
>>
>> --
>> Alexey Morozov,
>> LIN SB RAS, bioinformatics group.
>> Irkutsk, Russia.
>
>
>
>
> --
> Alexey Morozov,
> LIN SB RAS, bioinformatics group.
> Irkutsk, Russia.
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list