[Biopython] general scripting help

Fri Oct 11 00:24:14 UTC 2013

Thanks everyone, I'm going to try to get each to work. Also, I meant to say
I've been on biopython 2 days... but have been trying to do general python
tutorials for a couple of weeks now, and I would have never come up with
those suggestions. Thanks again, and thanks for the link also.

On Thu, Oct 10, 2013 at 3:56 PM, Nam Nguyen <bitsink at gmail.com> wrote:

> Module glob can help here too.
>
> import glob
> for filename in glob.glob('*.xml'):
>     extract_alignments(filename)
>
> You do not have to worry about "real OS".
>
> Cheers,
> Nam
>
>
> On Thu, Oct 10, 2013 at 3:53 PM, Nat Echols <nathaniel.echols at gmail.com>wrote:
>
>> Put all that code into a function with the file name or prefix as an
>> argument, then iterate over possible files:
>>
>> def extract_alignments (prefix) :
>>   # old code goes here - append ".xml" to prefix to get alignment file
>> name
>>
>> for prefix in ["xaa","xab","xac"] :
>>   extract_alignments(prefix)
>>
>> Or you could do this:
>>
>> import os.path
>> for file_name in sys.argv[1:] :
>>   prefix = os.path.splitext(file_name)[0]
>>   extract_alignments(prefix)
>>
>> And run as:
>>
>> python my_script.py x*.xml
>>
>> Assuming you have a real OS installed, of course - I'm not sure whether
>> Windows supports wildcards too.
>>
>> -Nat
>>
>>
>>
>> On Thu, Oct 10, 2013 at 3:36 PM, David Shin <davidsshin at lbl.gov> wrote:
>>
>> > Hi -
>> >
>> > I am trying to write a script to parse through 50 or so deltablast .xml
>> > files.
>> >
>> > File names are:
>> > xaa.xml
>> > xab.xml
>> > xac.xml
>> > ...
>> >
>> >
>> > I'm new (2 days) to python, biopython, and just trying to have
>> something to
>> > show for a meeting tomorrow. I have my script working well enough for
>> one
>> > file, I was wondering if there was a way to go thru each file separately
>> > and output according to file name.
>> >
>> > ie. I'm trying to replace "xaa" in lines 3 and 12 with a wildcard like
>> x??
>> > or even x*
>> >
>> >
>> > # This part gets the length of the query and stores to a variable
>> > from Bio import SeqIO
>> > record = SeqIO.read("xaa", "fasta")
>> > query_length = len(record)
>> > #print "query length:", query_length
>> >
>> > #This part gets the user's high and low percent identity cutoffs
>> > high_percent_cutoff = float(input("Enter high percent cutoff: "))
>> > low_percent_cutoff = float(input("Enter low percent cutoff: "))
>> >
>> > # This part does the comparison to all the hits if
>> > result_handle = open("xaa.xml")
>> > from Bio.Blast import NCBIXML
>> > blast_record = NCBIXML.read(result_handle)
>> >
>> > for alignment in blast_record.alignments:
>> >     for hsp in alignment.hsps:
>> >          alignment_length = alignment.length
>> >          identical_residues = hsp.identities
>> >          percent_identity = float(identical_residues) /
>> float(query_length)
>> >          if alignment_length > query_length * 0.9 and alignment_length <
>> > query_length * 1.1 and percent_identity > low_percent_cutoff and
>> > percent_identity <= high_percent_cutoff:
>> >               print "****Alignment****"
>> >               print "sequence:", alignment.title
>> >               print "query length:", query_length
>> >               print "alighment length:", alignment.length
>> >               print "identical residues:", identical_residues
>> >               print "percent identity:", percent_identity
>> >               print
>> >               print
>> >
>> >
>> "12345678901234567890123456789012345678901234567890123456789012345678901234567890"
>> >               print hsp.query[:80]
>> >               print hsp.match[:80]
>> >               print hsp.sbjct[:80]
>> >
>> >
>> > Thanks for any help.
>> > Dave
>> > _______________________________________________
>> > Biopython mailing list  -  Biopython at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biopython
>> >
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>
>

-- 
David Shin, Ph.D
Lawrence Berkeley National Labs
1 Cyclotron Road
MS 83-R0101
Berkeley, CA 94720
USA