[Biopython] Editing an Bio.Seq.Seq object
Sean Brimer
skbrimer at gmail.com
Wed Mar 16 20:36:21 UTC 2016
Hello Biopython users,
I'm trying to create a script that will allow me to add 'N' to a final
fasta of a template-guided assembly. I have already posted this question on
BioStars and received some good feedback and with it I feel I'm 99% done
with what I am trying to complete.
I am using python3 and Biopython 1.66
The link to the BioStars post is here: https://www.biostars.org/p/179755/
The code seems to do fine on making a dictionary of the coverage file and
then making a set containing only the "no coverage" areas. I can edit the
fasta header information i.e. new id, new description however I am unable
to make an editted version of the Seq object. I have tried using the str()
function and the .tomutable() function in several ways but none of them
seem to be giving me my expected output. I suspect the error is in how I'm
trying to call/edit the Seq function but from the tutorial guide I'm not
sure why it's not working.
Any guidance would be appreciated.
Thank you,
Sean
my code is below:
>
> from Bio.SeqRecord import SeqRecord
> from Bio.Alphabet import IUPAC
> from Bio.Seq import Seq
> from Bio import SeqIO
> import argparse
>
> ## this makes a dict of the samtools depth coverage input file of all the
> 0's
> coverageDict = {}
>
> ## this loops over the input depth information and appends the dictionary
> with
> ## a key,value as genome position, depth of coverage
> coverage = open("/home/sbrimer/Desktop/Columbia
> ST/1819_1/coverage.txt","r")
> for line in coverage:
> coverages = line.split("\t")
> coverageDict[coverages[1]] = coverages[2]
> coverage.close()
>
> ## this should read in the sequence file as an oblect, change it to a
> mutable
> ## object, then anywhere the coverageDict value = 0. Replace that nt with
> "N"
>
> missing = {index for index,value in coverageDict.items() if value == 0}
> def filter_coverage(index,value):
> return 'N' if index in missing else value
>
> newSeqRec = []
> with open("newfile.fasta","w") as handle:
> for seq_record in SeqIO.parse("Salmonella enterica
> subsp.fasta","fasta"):
> refseq = seq_record.seq
> newSeq = "".join(filter_coverage(index,value) for index,value in
> enumerate(refseq.tomutable(), start=1))
> newSeqRec = SeqRecord(Seq(newSeq),id ="new_id", description =
> str(seq_record.description))
> SeqIO.write(newSeqRec, handle, "fasta")
> handle.close()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20160316/8be1aec9/attachment.html>
More information about the Biopython
mailing list