[Biopython] Editing an Bio.Seq.Seq object

Sean Brimer skbrimer at gmail.com
Wed Mar 16 20:33:26 UTC 2016


Hello Biopython users,

I'm trying to create a script that will allow me to add 'N' to a final
fasta of a template-guided assembly. I have already posted this question on
BioStars and received some good feedback and with it I feel I'm 99% done
with what I am trying to complete.

I am using python3 and Biopython 1.66

The link to the BioStars post is here: https://www.biostars.org/p/179755/

The code seems to do fine on making a dictionary of the coverage file and
then making a set containing only the "no coverage" areas. I can edit the
fasta header information i.e. new id, new description however I am unable
to make an editted version of the Seq object. I have tried using the str()
function and the .tomutable() function in several ways but none of them
seem to be giving me my expected output. I suspect the error is in how I'm
trying to call/edit the Seq function but from the tutorial guide I'm not
sure why it's not working.

Any guidance would be appreciated.

Thank you,
Sean

my code is below:

>
> from Bio.SeqRecord import SeqRecord
> from Bio.Alphabet import IUPAC
> from Bio.Seq import Seq
> from Bio import SeqIO
> import argparse
>


> ## this makes a dict of the samtools depth coverage input file of all the
> 0's
> coverageDict = {}
>


> ## this loops over the input depth information and appends the dictionary
> with
> ## a key,value as genome position, depth of coverage
> coverage = open("/home/sbrimer/Desktop/Columbia
> ST/1819_1/coverage.txt","r")
> for line in coverage:
>     coverages = line.split("\t")
>     coverageDict[coverages[1]] = coverages[2]
> coverage.close()
>


> ## this should read in the sequence file as an oblect, change it to a
> mutable
> ## object, then anywhere the coverageDict value = 0. Replace that nt with
> "N"
>


> missing = {index for index,value in coverageDict.items() if value == 0}
> def filter_coverage(index,value):
>     return 'N' if index in missing else value
>


> newSeqRec = []
> with open("newfile.fasta","w") as handle:
>     for seq_record in SeqIO.parse("Salmonella enterica
> subsp.fasta","fasta"):
>         refseq = seq_record.seq
>         newSeq = "".join(filter_coverage(index,value) for index,value in
> enumerate(refseq.tomutable(), start=1))
>         newSeqRec = SeqRecord(Seq(newSeq),id ="new_id", description =
> str(seq_record.description))
>     SeqIO.write(newSeqRec, handle, "fasta")
> handle.close()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20160316/8fe57c85/attachment.html>


More information about the Biopython mailing list