[BioPython] Parsing BLAST

Alex Garbino agarbino at gmail.com
Fri Aug 29 16:10:00 UTC 2008


Assuming I just stick to making the plain sequence the 4th variable
(instead of in fasta format), how should I add it to my dictionary?
Doing:

output[x].extend(record.seq.tostring())

Will add each letter individually, so each entry has a few hundred
elements, rather than the forth element being the full string. join()
doesn't seem to be it...

Thanks,
Alex


On Fri, Aug 29, 2008 at 10:39 AM, Alex Garbino <agarbino at gmail.com> wrote:
>>> I'm now almost done. My script is to take a fasta file, run blast, and
>>> output a comma-separated-values list in the following format:
>>> AccessionID, Source, Length, FASTA sequence.
>>
>> FASTA sequence format looks like this:
>>
>>>name and description
>> CATACGACTACGTCAACGATCCGAACT
>> GACTACGATCAGCATCGACTAGCTGTG
>> GTGTGGT
>>>name2 and second sequence description
>> AGCGACAGCGACGAGCAGCGACGAG
>> AGCGAGC
>>
>> Its not something you can squeeze into a comma separared file.  I
>> think you might just mean getting the sequence itself - or have two
>> files (one CVS, one FASTA).
>>
>> Peter
>>
>
> That's the problem I'm having... I want to keep FASTA format (so I can
> plug it into ClustalW, etc), which is difficult to do because of the
> newline after the fasta title.
> Manually in excel, I could fit the whole FASTA into a cell, I think it
> was converted to a string (when I copy-pasted it into clustalw, it
> would be in  " ").
> Is there a way to ignore the newline between description and sequence?
>
> Thanks,
> Alex
>



More information about the Biopython mailing list