[Biopython] python advice needed
Kevin Rue
kevin.rue at ucdconnect.ie
Tue Apr 15 16:27:22 UTC 2014
Hi Csaba,
Well done! I witness everyday in my research group that the transition from
fundamental biology to bioinformatics is not a straightforward process.
Congratulations on your first successful experience.
To give some context to my answer, let me tell you that I am a 3rd year PhD
student trained in bioinformatics for the past 6 years (since my Master's
Degree). Python is the first programming language I was taught during my
Master's Degree (a tiny amount of Matlab in practicals of math before
that), and I was taught the object-oriented programming aspect through
classes of the Java programming language.
I am glad that you managed to teach yourself how to program in Python
through online resources. However, I think that going to actual classes can
ease the learning curve a lot, particularly at the beginning, and for new
topics such as object-oriented programming. The interactive Q&A with the
demonstrator, and the questions of other classmates can help rapidly come
across some common mistakes and tricks. For instance, a post-doc in my lab
is learning Python just like you, and I have seen him rack his head for
hours until I came along and pointed him in the right direction (avoid
giving a student an answer: "give someone food and he'll eat for the day,
teach them how to cook and they'll eat for the rest of their life").
Meanwhile, it is always useful to have a book around, I heard a lot of good
about the O'Reilly books for that matter. They have Python books for
beginners, intermediate and high-performance programming (
http://shop.oreilly.com/category/browse-subjects/programming/python.do).
Now, if you allow me a few personal pieces of advice about programming
(valid for Python and most languages):
- "Always write pseudo-code first"
- Pseudo-code is "an informal
high-level<http://en.wikipedia.org/wiki/High-level_programming_language>
description
of the operating principle of a computer program or other algorithm"
(Thanks Wikipedia, you just saved me 10 minutes to find my words)
- In other words, before you even approach you "file.py" script, turn
off the screen of your computer, take a piece of paper, and
write down what
your script is supposed to do, what input it will accept, what outputs it
will generate. First in one sentence of plain English. Then break the
sentence in subtasks. Then continue breaking each of these subtasks into
smaller ones until you recognise small tasks that you feel confident to
code in a reasonable number of lines.
- The pseudo-code is extremely valuable for two reasons:
- Avoid losing focus of what the script was originally intended to
do. (once coding, it is quite easy to lose sight of the greater scheme)
- It will help document your script, if you write a wiki or simply
to comment you code (if you share it with someone else, they
won't need to
read the entire code to understand its purpose)
- "Draw your objects/classes"
- Essentially, an object/class has a number of attributes
(=variables) and methods (=functions). For each I typically draw a box
entitled with the name of the class. Then in the box, I list the names of
the attributes and the names of the methods. The names of the attributes
and methods should clearly represent what they are meant to contain
(attributes) or do (methods).
- I still apply a rule that one of my earliest programming teacher
taught us: "functions are meant to do stuff, therefore their
name should
always start with a verb of action"
- "Google is your friend"
- That's a tricky one, but every time you know what you want to do
but you don't know how on earth you can do it: Google your
problem. You may
have to browse a while, or try different search words, but in my
experience
"Any problem you find to write working and efficient code, someone else
likely had the same problem before you". If you can clearly explain your
problem, StackOverflow and other such websites may have the answer.
- Use a code versioning tool
- All the changes you have done for the past week have made your
script worse and you don't have a copy of last week's script? Version
control tools such as git/GitHub and svn will help you keep track of what
your code looked like along the way. This way, you can edit a script that
is working to try and enhance it without the fear of messing it up. If it
goes sour, you can just go back to the working script without having to
keep a separate backup.
- Use a friendly (but still powerful) development environement
- IDE (Integrated development environement) are software which are
meant to make programming easier. A (silly?) example is a
feature I cannot
work without: auto-completion. Tired of typing the same long
variable name
over and over again? Once you have defined "variable=5" in your script, a
decent IDE will allow you to type only "var" and opens you a friendly
pop-up window suggesting you all existing variables and methods starting
with "var". Select the one you need with the arrow keys and hit TAB: you
don't have to type the rest of the variable. An amusing
side-effect of this
is that your variable names will grow longer (and therefore be more
explicit about what they contain). IDE come with many more features
including code checking, spell checking, ...
- For Python I am very happy with PyCharm
This email ended up to be much longer than I intended it, but I hope you
will find it useful !
The learning curve to Python progamming can be rough. Learning additional
tricks like version control, IDE, and object-oriented programming can make
it even steeper, but the end result is a very rewarding skillset that can
be helpful in many circumstances and appeal to many research group leaders
too!
Best of luck in your learning of Python !
Kevin
On 15 April 2014 15:58, Csaba Kiss <csaba.kiss at lanl.gov> wrote:
> Hi!
> I need some advice how to get better in python. I have written a software
> package to analyze antibody deep sequencing data. This was my first
> experience with python and I am not a programmer. The end result works,
> however, if a professional coder looks at the scripts, it is obvious that
> it was written by an amateur. I am planning to re-write the code into a
> better format that is extendable and more user and coder friendly. At the
> moment the script only relies on biopython to get the sequences and quality
> values out of sff and fastq files, the rest is custom written. I would like
> to rely more on biopython and also perhaps extend biopython with new
> features.
> The problem I am having is object oriented python and classes. I
> understand the concept of both, but it's completely different to actually
> use it. I would like to ask help from scientist who are in a similar
> situation, as myself. I am a molecular biologist with interest in coding,
> but little background. Do you have any good tutorials books about python
> classes and OOP? For example, when I learned python I found the Google
> python class, extremely valuable. I practically looked at the videos and
> solved the problems and that sent me on my way to python:
> https://developers.google.com/edu/python/?csw=1
>
> Any help would be appreciated:
> Csaba
>
> --
> Best Regards:
> Csaba Kiss PhD, MSc, BSc
> TA-43, HRL-1, MS888
> Los Alamos National Laboratory
> Work: 1-505-667-9898
> Cell: 1-505-920-5774
>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
--
Kévin RUE-ALBRECHT
Wellcome Trust Computational Infection Biology PhD Programme
University College Dublin
Ireland
http://fr.linkedin.com/pub/k%C3%A9vin-rue/28/a45/149/en
More information about the Biopython
mailing list