[Biopython] random peptide sequences

ferreirafm at usp.br ferreirafm at usp.br
Thu Apr 5 00:01:42 UTC 2012


Hi Peter,
It seems I get there, but can't write records to file using  
SeqIO.write as usual.
Fred

code:
def random_seq(fastafile):
     records = [ ]
     query = SeqIO.read(fastafile, "fasta")
     peplist = str(query.seq).split('GPGPG')
     peptup = tuple(str(query.seq).split('GPGPG'))
     for pep in peptup:
         outf = open("test.fasta", "w")
         peplist.remove(pep)
         for k in range(10):
             random.shuffle(peplist, random.random)
             peplist.insert(0, pep)
             rec = SeqRecord('GPGPG'.join(peplist), id="pep%s" % k)
             records.append(rec)
             print 'id: %s\nSeq: %s\n' % (rec.id, rec.seq)
             peplist.remove(pep)
         print records
         SeqIO.write(records, outf, "fasta")
         outf.close()
         sys.exit(1)


output:
$ random_pep.py --run br18.fasta
id: pep0
Seq:  
EELRSLYNTVATLYCVHGPGPGRDLLLIVTRIVELLGRGPGPGKRWIILGLNKIVRMYSPTSIGPGPGNTSYRLISCNTSVIGPGPGGKIILVAVHVASGYIGPGPGALFYKLDVVPIDGPGPGQRPLVTIKIGGQLKEGPGPGQQLLFIHFRIGCRHSRIGGPGPGELLKTVRLIKFLYQSNPGPGPGTPVNIIGRNLLTQIGGPGPGSPEVIPMFSALSEGPGPGSELYLYKVVKIEPLGVAPGPGPGSLQYLALVALVAPKKGPGPGVLAIVALVVATIIAIGPGPGTMLLGMLMICSAAGPGPGVLEWRFDSRLAFHHVGPGPGDKELYPLASLRSLFGGPGPGEAIIRILQQLLFIHF

id: pep1
Seq:  
EELRSLYNTVATLYCVHGPGPGRDLLLIVTRIVELLGRGPGPGALFYKLDVVPIDGPGPGSELYLYKVVKIEPLGVAPGPGPGKRWIILGLNKIVRMYSPTSIGPGPGVLAIVALVVATIIAIGPGPGQRPLVTIKIGGQLKEGPGPGELLKTVRLIKFLYQSNPGPGPGSLQYLALVALVAPKKGPGPGEAIIRILQQLLFIHFGPGPGVLEWRFDSRLAFHHVGPGPGTMLLGMLMICSAAGPGPGQQLLFIHFRIGCRHSRIGGPGPGNTSYRLISCNTSVIGPGPGSPEVIPMFSALSEGPGPGDKELYPLASLRSLFGGPGPGGKIILVAVHVASGYIGPGPGTPVNIIGRNLLTQIG

id: pep2
Seq:  
EELRSLYNTVATLYCVHGPGPGVLAIVALVVATIIAIGPGPGSLQYLALVALVAPKKGPGPGQRPLVTIKIGGQLKEGPGPGDKELYPLASLRSLFGGPGPGSPEVIPMFSALSEGPGPGALFYKLDVVPIDGPGPGEAIIRILQQLLFIHFGPGPGKRWIILGLNKIVRMYSPTSIGPGPGGKIILVAVHVASGYIGPGPGELLKTVRLIKFLYQSNPGPGPGSELYLYKVVKIEPLGVAPGPGPGTPVNIIGRNLLTQIGGPGPGTMLLGMLMICSAAGPGPGRDLLLIVTRIVELLGRGPGPGVLEWRFDSRLAFHHVGPGPGQQLLFIHFRIGCRHSRIGGPGPGNTSYRLISCNTSVI

id: pep3
Seq:  
EELRSLYNTVATLYCVHGPGPGGKIILVAVHVASGYIGPGPGTPVNIIGRNLLTQIGGPGPGTMLLGMLMICSAAGPGPGEAIIRILQQLLFIHFGPGPGQQLLFIHFRIGCRHSRIGGPGPGKRWIILGLNKIVRMYSPTSIGPGPGSELYLYKVVKIEPLGVAPGPGPGSLQYLALVALVAPKKGPGPGQRPLVTIKIGGQLKEGPGPGRDLLLIVTRIVELLGRGPGPGELLKTVRLIKFLYQSNPGPGPGSPEVIPMFSALSEGPGPGALFYKLDVVPIDGPGPGVLAIVALVVATIIAIGPGPGVLEWRFDSRLAFHHVGPGPGDKELYPLASLRSLFGGPGPGNTSYRLISCNTSVI

id: pep4
Seq:  
EELRSLYNTVATLYCVHGPGPGVLAIVALVVATIIAIGPGPGVLEWRFDSRLAFHHVGPGPGKRWIILGLNKIVRMYSPTSIGPGPGQRPLVTIKIGGQLKEGPGPGELLKTVRLIKFLYQSNPGPGPGSELYLYKVVKIEPLGVAPGPGPGALFYKLDVVPIDGPGPGSPEVIPMFSALSEGPGPGNTSYRLISCNTSVIGPGPGSLQYLALVALVAPKKGPGPGTPVNIIGRNLLTQIGGPGPGEAIIRILQQLLFIHFGPGPGQQLLFIHFRIGCRHSRIGGPGPGTMLLGMLMICSAAGPGPGRDLLLIVTRIVELLGRGPGPGDKELYPLASLRSLFGGPGPGGKIILVAVHVASGYI

id: pep5
Seq:  
EELRSLYNTVATLYCVHGPGPGSPEVIPMFSALSEGPGPGEAIIRILQQLLFIHFGPGPGTMLLGMLMICSAAGPGPGQQLLFIHFRIGCRHSRIGGPGPGELLKTVRLIKFLYQSNPGPGPGVLEWRFDSRLAFHHVGPGPGQRPLVTIKIGGQLKEGPGPGKRWIILGLNKIVRMYSPTSIGPGPGGKIILVAVHVASGYIGPGPGDKELYPLASLRSLFGGPGPGSLQYLALVALVAPKKGPGPGNTSYRLISCNTSVIGPGPGTPVNIIGRNLLTQIGGPGPGVLAIVALVVATIIAIGPGPGSELYLYKVVKIEPLGVAPGPGPGALFYKLDVVPIDGPGPGRDLLLIVTRIVELLGR

id: pep6
Seq:  
EELRSLYNTVATLYCVHGPGPGTPVNIIGRNLLTQIGGPGPGSPEVIPMFSALSEGPGPGVLEWRFDSRLAFHHVGPGPGTMLLGMLMICSAAGPGPGALFYKLDVVPIDGPGPGKRWIILGLNKIVRMYSPTSIGPGPGDKELYPLASLRSLFGGPGPGSELYLYKVVKIEPLGVAPGPGPGRDLLLIVTRIVELLGRGPGPGELLKTVRLIKFLYQSNPGPGPGNTSYRLISCNTSVIGPGPGVLAIVALVVATIIAIGPGPGEAIIRILQQLLFIHFGPGPGGKIILVAVHVASGYIGPGPGQQLLFIHFRIGCRHSRIGGPGPGQRPLVTIKIGGQLKEGPGPGSLQYLALVALVAPKK

id: pep7
Seq:  
EELRSLYNTVATLYCVHGPGPGTPVNIIGRNLLTQIGGPGPGTMLLGMLMICSAAGPGPGSLQYLALVALVAPKKGPGPGEAIIRILQQLLFIHFGPGPGSELYLYKVVKIEPLGVAPGPGPGVLAIVALVVATIIAIGPGPGELLKTVRLIKFLYQSNPGPGPGVLEWRFDSRLAFHHVGPGPGQQLLFIHFRIGCRHSRIGGPGPGSPEVIPMFSALSEGPGPGKRWIILGLNKIVRMYSPTSIGPGPGNTSYRLISCNTSVIGPGPGRDLLLIVTRIVELLGRGPGPGALFYKLDVVPIDGPGPGQRPLVTIKIGGQLKEGPGPGDKELYPLASLRSLFGGPGPGGKIILVAVHVASGYI

id: pep8
Seq:  
EELRSLYNTVATLYCVHGPGPGTPVNIIGRNLLTQIGGPGPGRDLLLIVTRIVELLGRGPGPGNTSYRLISCNTSVIGPGPGELLKTVRLIKFLYQSNPGPGPGSPEVIPMFSALSEGPGPGVLEWRFDSRLAFHHVGPGPGEAIIRILQQLLFIHFGPGPGDKELYPLASLRSLFGGPGPGALFYKLDVVPIDGPGPGQQLLFIHFRIGCRHSRIGGPGPGQRPLVTIKIGGQLKEGPGPGSLQYLALVALVAPKKGPGPGGKIILVAVHVASGYIGPGPGTMLLGMLMICSAAGPGPGSELYLYKVVKIEPLGVAPGPGPGVLAIVALVVATIIAIGPGPGKRWIILGLNKIVRMYSPTSI

id: pep9
Seq:  
EELRSLYNTVATLYCVHGPGPGVLEWRFDSRLAFHHVGPGPGSPEVIPMFSALSEGPGPGVLAIVALVVATIIAIGPGPGTPVNIIGRNLLTQIGGPGPGDKELYPLASLRSLFGGPGPGSELYLYKVVKIEPLGVAPGPGPGNTSYRLISCNTSVIGPGPGTMLLGMLMICSAAGPGPGGKIILVAVHVASGYIGPGPGALFYKLDVVPIDGPGPGQRPLVTIKIGGQLKEGPGPGKRWIILGLNKIVRMYSPTSIGPGPGRDLLLIVTRIVELLGRGPGPGEAIIRILQQLLFIHFGPGPGELLKTVRLIKFLYQSNPGPGPGSLQYLALVALVAPKKGPGPGQQLLFIHFRIGCRHSRIG

[SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGRDLLLIVTRIVELLGRGPGPGKRWIILGLNKIVRMYSPTSIGPGPGNTSYRLISCNTSVIGPGPGGKIILVAVHVASGYIGPGPGALFYKLDVVPIDGPGPGQRPLVTIKIGGQLKEGPGPGQQLLFIHFRIGCRHSRIGGPGPGELLKTVRLIKFLYQSNPGPGPGTPVNIIGRNLLTQIGGPGPGSPEVIPMFSALSEGPGPGSELYLYKVVKIEPLGVAPGPGPGSLQYLALVALVAPKKGPGPGVLAIVALVVATIIAIGPGPGTMLLGMLMICSAAGPGPGVLEWRFDSRLAFHHVGPGPGDKELYPLASLRSLFGGPGPGEAIIRILQQLLFIHF', id='pep0', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGRDLLLIVTRIVELLGRGPGPGALFYKLDVVPIDGPGPGSELYLYKVVKIEPLGVAPGPGPGKRWIILGLNKIVRMYSPTSIGPGPGVLAIVALVVATIIAIGPGPGQRPLVTIKIGGQLKEGPGPGELLKTVRLIKFLYQSNPGPGPGSLQYLALVALVAPKKGPGPGEAIIRILQQLLFIHFGPGPGVLEWRFDSRLAFHHVGPGPGTMLLGMLMICSAAGPGPGQQLLFIHFRIGCRHSRIGGPGPGNTSYRLISCNTSVIGPGPGSPEVIPMFSALSEGPGPGDKELYPLASLRSLFGGPGPGGKIILVAVHVASGYIGPGPGTPVNIIGRNLLTQIG', id='pep1', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGVLAIVALVVATIIAIGPGPGSLQYLALVALVAPKKGPGPGQRPLVTIKIGGQLKEGPGPGDKELYPLASLRSLFGGPGPGSPEVIPMFSALSEGPGPGALFYKLDVVPIDGPGPGEAIIRILQQLLFIHFGPGPGKRWIILGLNKIVRMYSPTSIGPGPGGKIILVAVHVASGYIGPGPGELLKTVRLIKFLYQSNPGPGPGSELYLYKVVKIEPLGVAPGPGPGTPVNIIGRNLLTQIGGPGPGTMLLGMLMICSAAGPGPGRDLLLIVTRIVELLGRGPGPGVLEWRFDSRLAFHHVGPGPGQQLLFIHFRIGCRHSRIGGPGPGNTSYRLISCNTSVI', id='pep2', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGGKIILVAVHVASGYIGPGPGTPVNIIGRNLLTQIGGPGPGTMLLGMLMICSAAGPGPGEAIIRILQQLLFIHFGPGPGQQLLFIHFRIGCRHSRIGGPGPGKRWIILGLNKIVRMYSPTSIGPGPGSELYLYKVVKIEPLGVAPGPGPGSLQYLALVALVAPKKGPGPGQRPLVTIKIGGQLKEGPGPGRDLLLIVTRIVELLGRGPGPGELLKTVRLIKFLYQSNPGPGPGSPEVIPMFSALSEGPGPGALFYKLDVVPIDGPGPGVLAIVALVVATIIAIGPGPGVLEWRFDSRLAFHHVGPGPGDKELYPLASLRSLFGGPGPGNTSYRLISCNTSVI', id='pep3', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGVLAIVALVVATIIAIGPGPGVLEWRFDSRLAFHHVGPGPGKRWIILGLNKIVRMYSPTSIGPGPGQRPLVTIKIGGQLKEGPGPGELLKTVRLIKFLYQSNPGPGPGSELYLYKVVKIEPLGVAPGPGPGALFYKLDVVPIDGPGPGSPEVIPMFSALSEGPGPGNTSYRLISCNTSVIGPGPGSLQYLALVALVAPKKGPGPGTPVNIIGRNLLTQIGGPGPGEAIIRILQQLLFIHFGPGPGQQLLFIHFRIGCRHSRIGGPGPGTMLLGMLMICSAAGPGPGRDLLLIVTRIVELLGRGPGPGDKELYPLASLRSLFGGPGPGGKIILVAVHVASGYI', id='pep4', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGSPEVIPMFSALSEGPGPGEAIIRILQQLLFIHFGPGPGTMLLGMLMICSAAGPGPGQQLLFIHFRIGCRHSRIGGPGPGELLKTVRLIKFLYQSNPGPGPGVLEWRFDSRLAFHHVGPGPGQRPLVTIKIGGQLKEGPGPGKRWIILGLNKIVRMYSPTSIGPGPGGKIILVAVHVASGYIGPGPGDKELYPLASLRSLFGGPGPGSLQYLALVALVAPKKGPGPGNTSYRLISCNTSVIGPGPGTPVNIIGRNLLTQIGGPGPGVLAIVALVVATIIAIGPGPGSELYLYKVVKIEPLGVAPGPGPGALFYKLDVVPIDGPGPGRDLLLIVTRIVELLGR', id='pep5', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGTPVNIIGRNLLTQIGGPGPGSPEVIPMFSALSEGPGPGVLEWRFDSRLAFHHVGPGPGTMLLGMLMICSAAGPGPGALFYKLDVVPIDGPGPGKRWIILGLNKIVRMYSPTSIGPGPGDKELYPLASLRSLFGGPGPGSELYLYKVVKIEPLGVAPGPGPGRDLLLIVTRIVELLGRGPGPGELLKTVRLIKFLYQSNPGPGPGNTSYRLISCNTSVIGPGPGVLAIVALVVATIIAIGPGPGEAIIRILQQLLFIHFGPGPGGKIILVAVHVASGYIGPGPGQQLLFIHFRIGCRHSRIGGPGPGQRPLVTIKIGGQLKEGPGPGSLQYLALVALVAPKK', id='pep6', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGTPVNIIGRNLLTQIGGPGPGTMLLGMLMICSAAGPGPGSLQYLALVALVAPKKGPGPGEAIIRILQQLLFIHFGPGPGSELYLYKVVKIEPLGVAPGPGPGVLAIVALVVATIIAIGPGPGELLKTVRLIKFLYQSNPGPGPGVLEWRFDSRLAFHHVGPGPGQQLLFIHFRIGCRHSRIGGPGPGSPEVIPMFSALSEGPGPGKRWIILGLNKIVRMYSPTSIGPGPGNTSYRLISCNTSVIGPGPGRDLLLIVTRIVELLGRGPGPGALFYKLDVVPIDGPGPGQRPLVTIKIGGQLKEGPGPGDKELYPLASLRSLFGGPGPGGKIILVAVHVASGYI', id='pep7', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGTPVNIIGRNLLTQIGGPGPGRDLLLIVTRIVELLGRGPGPGNTSYRLISCNTSVIGPGPGELLKTVRLIKFLYQSNPGPGPGSPEVIPMFSALSEGPGPGVLEWRFDSRLAFHHVGPGPGEAIIRILQQLLFIHFGPGPGDKELYPLASLRSLFGGPGPGALFYKLDVVPIDGPGPGQQLLFIHFRIGCRHSRIGGPGPGQRPLVTIKIGGQLKEGPGPGSLQYLALVALVAPKKGPGPGGKIILVAVHVASGYIGPGPGTMLLGMLMICSAAGPGPGSELYLYKVVKIEPLGVAPGPGPGVLAIVALVVATIIAIGPGPGKRWIILGLNKIVRMYSPTSI', id='pep8', name='<unknown name>', description='<unknown description>', dbxrefs=[]), SeqRecord(seq='EELRSLYNTVATLYCVHGPGPGVLEWRFDSRLAFHHVGPGPGSPEVIPMFSALSEGPGPGVLAIVALVVATIIAIGPGPGTPVNIIGRNLLTQIGGPGPGDKELYPLASLRSLFGGPGPGSELYLYKVVKIEPLGVAPGPGPGNTSYRLISCNTSVIGPGPGTMLLGMLMICSAAGPGPGGKIILVAVHVASGYIGPGPGALFYKLDVVPIDGPGPGQRPLVTIKIGGQLKEGPGPGKRWIILGLNKIVRMYSPTSIGPGPGRDLLLIVTRIVELLGRGPGPGEAIIRILQQLLFIHFGPGPGELLKTVRLIKFLYQSNPGPGPGSLQYLALVALVAPKKGPGPGQQLLFIHFRIGCRHSRIG', id='pep9', name='<unknown name>', description='<unknown description>',  
dbxrefs=[])]
Traceback (most recent call last):
   File "/home/ferreirafm/bin/random_pep.py", line 173, in <module>
     main()
   File "/home/ferreirafm/bin/random_pep.py", line 156, in main
     random_seq(fastafile)
   File "/home/ferreirafm/bin/random_pep.py", line 39, in random_seq
     SeqIO.write(records, outf, "fasta")
   File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/__init__.py",  
line 412, in write
     count = writer_class(handle).write_file(sequences)
   File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/Interfaces.py",  
line 271, in write_file
     count = self.write_records(records)
   File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/Interfaces.py",  
line 256, in write_records
     self.write_record(record)
   File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/FastaIO.py",  
line 136, in write_record
     data = self._get_seq_string(record) #Catches sequence being None
   File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/Interfaces.py",  
line 164, in _get_seq_string
     % record.id)
TypeError: SeqRecord (id=pep0) has an invalid sequence.



Citando Peter Cock <p.j.a.cock at googlemail.com>:

> On Wed, Apr 4, 2012 at 8:56 PM,  <ferreirafm at usp.br> wrote:
>>
>> Hi Peter,
>> Thanks for helping. I'll try something like that and let you know the
>> results.
>> Fred
>
> Good luck - and please reply on the list to let us know how you get on :)
>
> Peter
>






More information about the Biopython mailing list