[Biopython] Filter, sort and write FASTA file

Mic mictadlo at gmail.com
Tue Dec 15 02:55:10 UTC 2015


Thank you Peter. It works.

Best wishes,

Mic

On Tue, Dec 15, 2015 at 1:08 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> On Mon, Dec 14, 2015 at 9:50 AM, Mic <mictadlo at gmail.com> wrote:
> > Hello,
> >
> > I run MACS2 to find out were peaks are located (see attached).
> > I wrote a script (see attached) which:
> >
> > read the peaks information
> > extract the sequence from genome.fasta with help of the peak information
> > the extracted sequences are saved in a new FASTA file.
> >
> > The MACS_peaks.csv file contains a column fold_enrichment and the script
> > safe this information in peak.fold_enrichment.
> >
> > How is it possible to sort the output sequences with highest
> > peak.fold_enrichment?
>
> You are building a list of SeqRecord objects in memory, so you
> would want to sort that before passing it to SeqIO.write.
>
> e.g. Right now your code does something like this:
>
> records = [] # list of SeqRecord to be written
> for record in SeqIO.parse(open(fasta_file_name, "rU"), "fasta"):
>     if ...:
>         records.append(SeqRecord(...))
> SeqIO.write(records, out_file, "fasta")
>
>
>
> The easiest way to sort the records would be to make a list
> of tuples (score, SeqRecord) which you can then sort. Then
> once sorted, pull out just the SeqRecord objects for output.
>
>
> score_and_records = []  # list of (score, record)
> for record in SeqIO.parse(open(fasta_file_name, "rU"), "fasta"):
>     if ...:
>         score_and_records.append((peak.fold_enrichment, SeqRecord(...)))
> # This will sort by score since that is first in the tuples:
> score_and_records.sort()
> records = [record for (score, record) in score_and_records]
> SeqIO.write(records, out_file, "fasta")
>
>
> Does that make sense?
>
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20151215/2c7e3213/attachment.html>


More information about the Biopython mailing list