[Biopython] parsing fasta based on header
Peter Cock
p.j.a.cock at googlemail.com
Tue Nov 1 21:04:39 UTC 2011
On Tue, Nov 1, 2011 at 7:53 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Matthew,
>
> You can use Python generators for this. Here's a rough example:
>
> # generators for the two different groups
> seq_1 = (r for r in SeqIO.parse(open('QHM-clean.fasta', 'rU'), 'fasta') if
> r.id.startswith('1'))
> seq_2 = (r for r in SeqIO.parse(open('QHM-clean.fasta', 'rU'), 'fasta') if
> r.id.startswith('2'))
>
> # seqs, filenames pair list
> pairs = [(seq_1, 'file_1'), (seq_2, 'file_2')]
>
> # the actual write
> for seq, filename in pairs:
> SeqIO.write(seq, open(filename, 'w'), 'fasta')
>
> cheers,
> Bowo
Email does tend to mess up the indentation in Python :(
I'm pleased to see that's very similar to my answer earlier,
http://biostar.stackexchange.com/questions/13791/parsing-fasta-based-on-header/13793
By the way Wibiwo, rather than this:
SeqIO.write(seq, open(filename, 'w'), 'fasta')
use this:
SeqIO.write(seq, filename, 'fasta')
It is shorter but also will ensure the handle is closed
promptly on Jython/PyPy where garbage collection
isn't as predictable as on normal C Python.
Peter
More information about the Biopython
mailing list