[Biopython] general scripting help

Thu Oct 10 22:56:04 UTC 2013

Module glob can help here too.

import glob
for filename in glob.glob('*.xml'):
    extract_alignments(filename)

You do not have to worry about "real OS".

Cheers,
Nam

On Thu, Oct 10, 2013 at 3:53 PM, Nat Echols <nathaniel.echols at gmail.com>wrote:

> Put all that code into a function with the file name or prefix as an
> argument, then iterate over possible files:
>
> def extract_alignments (prefix) :
>   # old code goes here - append ".xml" to prefix to get alignment file name
>
> for prefix in ["xaa","xab","xac"] :
>   extract_alignments(prefix)
>
> Or you could do this:
>
> import os.path
> for file_name in sys.argv[1:] :
>   prefix = os.path.splitext(file_name)[0]
>   extract_alignments(prefix)
>
> And run as:
>
> python my_script.py x*.xml
>
> Assuming you have a real OS installed, of course - I'm not sure whether
> Windows supports wildcards too.
>
> -Nat
>
>
>
> On Thu, Oct 10, 2013 at 3:36 PM, David Shin <davidsshin at lbl.gov> wrote:
>
> > Hi -
> >
> > I am trying to write a script to parse through 50 or so deltablast .xml
> > files.
> >
> > File names are:
> > xaa.xml
> > xab.xml
> > xac.xml
> > ...
> >
> >
> > I'm new (2 days) to python, biopython, and just trying to have something
> to
> > show for a meeting tomorrow. I have my script working well enough for one
> > file, I was wondering if there was a way to go thru each file separately
> > and output according to file name.
> >
> > ie. I'm trying to replace "xaa" in lines 3 and 12 with a wildcard like
> x??
> > or even x*
> >
> >
> > # This part gets the length of the query and stores to a variable
> > from Bio import SeqIO
> > record = SeqIO.read("xaa", "fasta")
> > query_length = len(record)
> > #print "query length:", query_length
> >
> > #This part gets the user's high and low percent identity cutoffs
> > high_percent_cutoff = float(input("Enter high percent cutoff: "))
> > low_percent_cutoff = float(input("Enter low percent cutoff: "))
> >
> > # This part does the comparison to all the hits if
> > result_handle = open("xaa.xml")
> > from Bio.Blast import NCBIXML
> > blast_record = NCBIXML.read(result_handle)
> >
> > for alignment in blast_record.alignments:
> >     for hsp in alignment.hsps:
> >          alignment_length = alignment.length
> >          identical_residues = hsp.identities
> >          percent_identity = float(identical_residues) /
> float(query_length)
> >          if alignment_length > query_length * 0.9 and alignment_length <
> > query_length * 1.1 and percent_identity > low_percent_cutoff and
> > percent_identity <= high_percent_cutoff:
> >               print "****Alignment****"
> >               print "sequence:", alignment.title
> >               print "query length:", query_length
> >               print "alighment length:", alignment.length
> >               print "identical residues:", identical_residues
> >               print "percent identity:", percent_identity
> >               print
> >               print
> >
> >
> "12345678901234567890123456789012345678901234567890123456789012345678901234567890"
> >               print hsp.query[:80]
> >               print hsp.match[:80]
> >               print hsp.sbjct[:80]
> >
> >
> > Thanks for any help.
> > Dave
> > _______________________________________________
> > Biopython mailing list  -  Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>