[Biopython] Generator expression for SeqIO

Eric Talevich eric.talevich at gmail.com
Wed Dec 7 11:07:57 EST 2011


Mic,

You don't really need a generator expression here, but I recommend that you
read the Python Tutorial to learn how to use them anyway.

To solve your problem, here's one solution using Biopython and a list
comprehension (like a generator expression, but more your pace):

def row_to_seqrecord(row):
    """Convert a tab-delimited row to a SeqRecord.

    Row looks like:
    test1\t0001\a1\tAATTCC

    Record looks like (conceptually):
    >test1_a1
    AATTCC
    """
    cells = [cell.strip() for cell in row.split('\t')]
    return SeqRecord(Seq(cells[3]), id=cells[0] + '_' + cells[2])

with open('input.txt') as infile:
    records = [row_to_seqrecord(line) for line in infile]

SeqIO.write(records, 'output.txt', 'fasta')


But the nice thing about FASTA format is that there's almost no structure
to it. Here's a simpler way to do it that doesn't use Biopython:

with open('input.txt') as infile:
    with open('output.fasta', 'w+') as outfile:
        for line in infile:
            parts = [part.strip() for part in line.split('\t')]
            if len(parts) != 4:
                continue
            # Header
            outfile.write(">%s_%s\n" % (parts[0], parts[2])
            # Sequence
            outfile.write(parts[3] + '\n')



On Tue, Dec 6, 2011 at 11:41 PM, Mic <mictadlo at gmail.com> wrote:

> No worries is was perfect.
>
> I have the following code and I do not know how to combine the *header* and
> *seq* variables from the '*with*' statement with generator expression?
>
> from Bio import SeqIO
> from Bio.SeqRecord import SeqRecord
> from Bio.Seq import Seq
> from pprint import pprint
>
> if __name__ == '__main__':
>
>    *with* open('input.txt') as f:
>        for line in f:
>            try:
>                splited_line = line.split('\t')
>
>                *header* = splited_line[0] +'_'+ splited_line[2]
>                *seq* = splited_line[3]
>            except IndexError:
>                continue
>
>    fasta_file = open('output.fasta', 'w')
>    records = (SeqRecord(???), id=????, description="") for i in ???)
>
>    SeqIO.write(records, fasta_file, "fasta")
>
> Thank you in advance.
>
> On Thu, Dec 1, 2011 at 6:52 PM, Peter Cock <p.j.a.cock at googlemail.com
> >wrote:
>
> >
> >
> > On Wednesday, November 30, 2011, Mic <mictadlo at gmail.com> wrote:
> > > Thank you it is working.
> > >
> >
> > Excellent - sorry I couldn't think of a nice way to explain the syntax.
> >
> > Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


More information about the Biopython mailing list