[BioPython] trying to make NBRF dictionary

Brad Chapman chapmanb at uga.edu
Wed Mar 17 21:32:28 EST 2004


Hi Ashleigh;

> Hello.  As there seems to be no existing Bio.Fasta-style dictionary code
> for alignments (Clustalw or NBRF), I thought I'd try to write a simple
> script using the NBRF iterator to make a dictionary of sequence
> name:sequence key:value pairs.

Okay, this makes good sense.

> I'm stuck already.  My code seems to make a dictionary of sorts, 
> but it behaves like it only
> has 1 key:value pair rather than 4 (len(mydict) returns 1) and the keys 
> are just my variable name (cur_record.sequence_name), not what I think
> the keys should be - the actual data I put into the dictionary.  I'm
> guessing that means I have some scope problem.

Yes, I think you're right. The output you gave seems to be what you
actually want (or at least what you describe you want above) but the
code itself does contain a bit of confusion with the mydict
dictionary, so it's probably something in the code that we don't see
in the example.


> mydict={}
>  
> def makedict(file1):
>      parser=NBRF.RecordParser()
>      first_file=open(file1, 'r')
>      iterator=NBRF.Iterator(first_file, parser)
>       
>      while 1:
>          cur_record=iterator.next()
>          if cur_record is None:
>              break
>          name=cur_record.sequence_name
>          sequence=cur_record.sequence.data
>          mydict[name] = sequence
>           
>      return mydict

Okay, that major confusion here is that mydict should be internal to
the makedict function. It seems like you would get an
UnboundLocalError with the code you posted, so I'm not exactly sure,
but guessing your function should look like:

def makedict(file1):
     parser=NBRF.RecordParser()
     first_file=open(file1, 'r')
     iterator=NBRF.Iterator(first_file, parser)
     mydict = {}
      
     while 1:
         cur_record=iterator.next()
         if cur_record is None:
             break
         name=cur_record.sequence_name
         sequence=cur_record.sequence.data
         mydict[name] = sequence
          
     return mydict

Then you should be able to call it without any problem doing
something like:

file1_dict = makedict("my_file1.nbrf")
file2_dict = makedict("my_file2.nbrf")

>From the problems you are describing, it should like you are doing
something where you reassign mydict because it is used both
internally and externally of the function. 

One of the major problems with using functions (definitely forgive
me if I'm being too simplified here) is not having a good grasp of
which variables are internal to the function and which are external.
In general you want to focus on remembering that the only outside
information you should be passing the function is the argument
(file1 in this case) and the only information you should get back is
what you return (the dictionary in this case).

But, I digress. Hope this helps.
Brad


More information about the BioPython mailing list