[Biopython] Tutorial Question 7.4 alignment.title

Peter biopython at maubp.freeserve.co.uk
Fri Oct 8 15:56:26 UTC 2010


On Fri, Oct 8, 2010 at 4:45 PM, Ara Kooser <akooser at unm.edu> wrote:
> Peter,
>
> Thanks for your reply. I started to fiddle around with parsing the string
> last night but haven't made much progress.
>
> At the moment the output looks like this:
>
> ****Alignment****
> sequence: gi|302529614|ref|ZP_07281956.1| predicted protein [Streptomyces
> sp. AA4] >gi|302438509|gb|EFL10325.1| predicted protein [Streptomyces sp.
> AA4]
> e value: 1.89229e-46
> length: 1109
> start: 7
> end: 414
>
> So what I want from the sequence string is the following:
> [Streptomyces sp. AA4]
> ZP_07281956.1
>
> printed out as separated lines like the rest of the output.

You could do this with regular expressions (import re), or some simple
python searching for the square brackets etc.

> After that is figured out I want to put all the information in columns so it
> can be read into a spreadsheet in OO so that it looks like this:
> Name    Locus # E_value Length  Start   End

It would be much simpler to ask BLAST to give you tabular ouput.
If you are using BLAST+ you can even specify which columns you
want (although this won't pull out the organism name for you).

Peter




More information about the Biopython mailing list