[Biopython] Editing an Bio.Seq.Seq object

Saket Choudhary saketkc at gmail.com
Thu Mar 17 00:06:52 UTC 2016


Hi Sean,

If you are doing this using a for loop, won’t your
SeqIO.write(new_rec,"NewSeq.fasta","fasta") simply write the last record of
the loop.
I believe you should be passing a handle to SeqIO.write, with an append
flag.

Also, since you are creating a new Seq object from a new string altogether,
I don’t think tomutable is required, unless you want to do something like
this(the `append_handle` initialization being out of scope of for loop):

append_handle = open('new_records.fasta','a')
mutable_rec = refseq.tomutable()
for index,value in enumerate(mutable_rec):
    mutable_rec.seq[i] = filter_coverage(index,value)
SeqIO.write(mutable_rec, append_handle, 'fasta')

​

On 16 March 2016 at 13:36, Sean Brimer <skbrimer at gmail.com> wrote:

> Hello Biopython users,
>
> I'm trying to create a script that will allow me to add 'N' to a final
> fasta of a template-guided assembly. I have already posted this question on
> BioStars and received some good feedback and with it I feel I'm 99% done
> with what I am trying to complete.
>
> I am using python3 and Biopython 1.66
>
> The link to the BioStars post is here: https://www.biostars.org/p/179755/
>
> The code seems to do fine on making a dictionary of the coverage file and
> then making a set containing only the "no coverage" areas. I can edit the
> fasta header information i.e. new id, new description however I am unable
> to make an editted version of the Seq object. I have tried using the str()
> function and the .tomutable() function in several ways but none of them
> seem to be giving me my expected output. I suspect the error is in how I'm
> trying to call/edit the Seq function but from the tutorial guide I'm not
> sure why it's not working.
>
> Any guidance would be appreciated.
>
> Thank you,
> Sean
>
> my code is below:
>
>>
>> from Bio.SeqRecord import SeqRecord
>> from Bio.Alphabet import IUPAC
>> from Bio.Seq import Seq
>> from Bio import SeqIO
>> import argparse
>>
>
>
>> ## this makes a dict of the samtools depth coverage input file of all the
>> 0's
>> coverageDict = {}
>>
>
>
>> ## this loops over the input depth information and appends the dictionary
>> with
>> ## a key,value as genome position, depth of coverage
>> coverage = open("/home/sbrimer/Desktop/Columbia
>> ST/1819_1/coverage.txt","r")
>> for line in coverage:
>>     coverages = line.split("\t")
>>     coverageDict[coverages[1]] = coverages[2]
>> coverage.close()
>>
>
>
>> ## this should read in the sequence file as an oblect, change it to a
>> mutable
>> ## object, then anywhere the coverageDict value = 0. Replace that nt with
>> "N"
>>
>
>
>> missing = {index for index,value in coverageDict.items() if value == 0}
>> def filter_coverage(index,value):
>>     return 'N' if index in missing else value
>>
>
>
>> newSeqRec = []
>> with open("newfile.fasta","w") as handle:
>>     for seq_record in SeqIO.parse("Salmonella enterica
>> subsp.fasta","fasta"):
>>         refseq = seq_record.seq
>>         newSeq = "".join(filter_coverage(index,value) for index,value in
>> enumerate(refseq.tomutable(), start=1))
>>         newSeqRec = SeqRecord(Seq(newSeq),id ="new_id", description =
>> str(seq_record.description))
>>     SeqIO.write(newSeqRec, handle, "fasta")
>> handle.close()
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20160316/9ae8dd4b/attachment-0001.html>


More information about the Biopython mailing list