[Biopython] change rec.id problems

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 25 01:20:09 UTC 2013


On Mon, Jun 24, 2013 at 10:53 PM, Frederico Moraes Ferreira
<ferreirafm at usp.br> wrote:
> Hi list,
> I'm trying to change the rec.id as so the file name replaces the beginning
> id string itself.
> The code is as follows:
>
>     for inf in inflist:
>         rec = SeqIO.read(open(inf, "rU"), "fasta")

You can shorten that to:

        rec = SeqIO.read(inf, "fasta")

This is also better in that is explicitly closes the file handle
(which is a good habit to adopt).

>         if inf[:-5] != rec.id.split('|')[0][:-3]:
>             print rec.id
>             rec.id = '%spep|%s' % (inf[:-5],
> '|'.join(rec.id.split('|')[1:]))
>             print rec.id
>             outf = '.'.join(inf.split('.')[:-1]) + '_new.fasta'
>             SeqIO.write(rec, outf, 'fasta')
>
> Judging by the prints bellow, the program seems to be working fine.
>
> ####output########
> emm52.pep|166|Type:P
> emm52.0.pep|166|Type:P
> emm5-21.pep|178|Type:P
> emm5.21.pep|178|Type:P
> emm52-1.pep|240|Type:P
> emm52.1.pep|240|Type:P
> emm5-22.pep|219|Type:P
> emm5.22.pep|219|Type:P
> emm5-23.pep|231|Type:P
> emm5.23.pep|231|Type:P
> emm5-24.pep|157|Type:P
> emm5.24.pep|157|Type:P
> emm5-25.pep|110|Type:P
>
> However, in the file the new and old ids were concatenated.
>
>>emm52.0.pep|166|Type:P emm52.pep|166|Type:P <unknown description>
> GTASVAVGLTVVGAGLASQTEVKADQPVDHHRYTEANDAVLQGRTVSARALLHEINKNGQ
> LRSENEELKADLQKKEQELKNLNDDVKKLNDEVALERLKNERHVHDEEVELERLKNERHD
> HDKKEAERKALEDKLADKQEHLDGALRYINEKEAERKEKEAEQKKL
>
> Am I doing something wrong?
> All the best,
> Fred

The FASTA parser and writer has some non-obvious behaviour with
regard to the contents of the ">" line and the id/name/description.
(With hindsight I would have done this differently, but instead it
followed what the older Biopython parser did.)

Add some print statements to look at the all of those values -
you may wish to explicitly set the record's description as well.

Peter



More information about the Biopython mailing list