[BioPython] Parsing BLAST

Alex Garbino agarbino at gmail.com
Fri Aug 29 15:39:22 UTC 2008


>> I'm now almost done. My script is to take a fasta file, run blast, and
>> output a comma-separated-values list in the following format:
>> AccessionID, Source, Length, FASTA sequence.
>
> FASTA sequence format looks like this:
>
>>name and description
> CATACGACTACGTCAACGATCCGAACT
> GACTACGATCAGCATCGACTAGCTGTG
> GTGTGGT
>>name2 and second sequence description
> AGCGACAGCGACGAGCAGCGACGAG
> AGCGAGC
>
> Its not something you can squeeze into a comma separared file.  I
> think you might just mean getting the sequence itself - or have two
> files (one CVS, one FASTA).
>
> Peter
>

That's the problem I'm having... I want to keep FASTA format (so I can
plug it into ClustalW, etc), which is difficult to do because of the
newline after the fasta title.
Manually in excel, I could fit the whole FASTA into a cell, I think it
was converted to a string (when I copy-pasted it into clustalw, it
would be in  " ").
Is there a way to ignore the newline between description and sequence?

Thanks,
Alex



More information about the Biopython mailing list