[Biopython] Strange Gaps when writing Multi-Fasta

Brett Bowman bnbowman at gmail.com
Thu Feb 24 17:41:50 UTC 2011


I'm trying to write my own script to parse multiple alignments from
the new standalone PSI-Blast output, but when I try to write the
results to a file, I get really odd results.  I'm storing the sequence
data in a dictionary, using the sequence ID as the key and the
sequence itself as the value.  However, when I try to write these
results to file, one of the sequences comes out with an extra blank
line in the middle, such as you see in sample_seq_1.  This is despite
the fact that if I print the sequence as a string, it comes out a
single, contiguous line as you can see in sample_seq_2.

I've tried both creating an array of seqs and writing them to file
with SeqIO, as well as writing them out myself with a for loop such
as:
for i in range(0,length,60):
    print seq[i:i+60]
and still I get the same problem.  But what's weird is that, though
which sequence shows the extra line is consistent with each method,
they are different between methods.  I.e., using SeqIO always throws
an extra line into SeqA, but the for-loop above always throws an extra
line into SeqB.  What makes this doubly weird is that when I copy the
Seq in question by hand to another file, the extra line disappears as
if it was never there!  Its almost as if that extra line is a glitch,
or non-standard whitespace character or something.

This has me completely stumped.  Does anyone have any ideas as to why
this is happening, or how to fix it?

-Brett Bowman
Senior Research Associate
Cibus US LLC
-------------- next part --------------
>YP_002749131 <unknown description>
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-----------------------------------------GAGHVIQCLKKLGVTTVFG
YPGGAILPVYDALY-E-SG----L----KHILTRHEQAAIHAAEGYARASGKVGVVFATS
GPGATNLVTGLADAYMDSIPLVVITGQVATPLIGKDGFQEADVVGITVPVTKHNYQVRDV
NQLSRIVQEAFYIAESGRPGPVLIDIPKD-V---Q----IE---K---V---T----S--
--F------Y--N---EV--I---E--I--P---G--Y------K---IED---MP----
D-S-M---K-L-KEV---AK---EISKAKRPLLY--IGGG--V--I--H----SGG--S-
-D--E----LIKFAREHRI--PVVSTLMGLGAYPPG----------D-S-LFLGMLGMHG

TYAANMAVTECNLLLALGVRFDDRVTGKLELFSPQS-K-KV-HIDIDSSEFHKNVTVEYP
VVGDVK-NA----L----H---M-L---L------H-MPI-----D-T------------
-Q----T-----D----E----W---L---T----K----I----E---G-----WKEEY
--------PLSY-N--QK-E---R-E-LKPQHVI-SLV-SE-L---T-N-G--E----AI
-VTTEVGQHQMWAAHFYKAKNPRTFLTSGGLGTMGFGFPAAIGAQLA------KEEQLVI
CIAGDASFQMNIQELQTVAENNIPVKVFIINNKFLGMVRQWQEMFYENRLSESKI-----
------G--------------S--------------------------------------
-------------------------------P-DFVKVAEAYGVKGLRATNSTEAK-QVM
---LEAFA-HE-G-PVVVDFCVEEG-------------EYVFPMVPPNKGNNEMIMK---
-------
-------------- next part --------------
>YP_002749131
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GAGHVIQCLKKLGVTTVFGYPGGAILPVYDALY-E-SG----L----KHILTRHEQAAIHAAEGYARASGKVGVVFATSGPGATNLVTGLADAYMDSIPLVVITGQVATPLIGKDGFQEADVVGITVPVTKHNYQVRDVNQLSRIVQEAFYIAESGRPGPVLIDIPKD-V---Q----IE---K---V---T----S----F------Y--N---EV--I---E--I--P---G--Y------K---IED---MP----D-S-M---K-L-KEV---AK---EISKAKRPLLY--IGGG--V--I--H----SGG--S--D--E----LIKFAREHRI--PVVSTLMGLGAYPPG----------D-S-LFLGMLGMHGTYAANMAVTECNLLLALGVRFDDRVTGKLELFSPQS-K-KV-HIDIDSSEFHKNVTVEYPVVGDVK-NA----L----H---M-L---L------H-MPI-----D-T-------------Q----T-----D----E----W---L---T----K----I----E---G-----WKEEY--------PLSY-N--QK-E---R-E-LKPQHVI-SLV-SE-L---T-N-G--E----AI-VTTEVGQHQMWAAHFYKAKNPRTFLTSGGLGTMGFGFPAAIGAQLA------KEEQLVICIAGDASFQMNIQELQTVAENNIPVKVFIINNKFLGMVRQWQEMFYENRLSESKI-----------G--------------S---------------------------------------------------------------------P-DFVKVAEAYGVKGLRATNSTEAK-QVM---LEAFA-HE-G-PVVVDFCVEEG-------------EYVFPMVPPNKGNNEMIMK----------


More information about the Biopython mailing list