Robust Python Environment for Astronomy and Machine Learning
1. Introduction
Programming skill is almost a must to learn nowadays. As you know, this skill is useful for testing models (e.g. simulations) created by theorists, and analyzing empirical data (e.g. images) gathered by observers. Python has become a popular choice because of its relative simplicity and power.
There is a wealth of powerful, free, open source software available within the Python ecosystem. This document will guide you through the basics of setting up a robust and convenient platform-independent Python environment, as well as introduce you to the powerful tool git, and its accompanying code repository GitHub. I will also introduce Jupyter notebooks, which are a powerful way to develop and share code ideas, and also provide the capability to interactively learn about various subjects right in your browser using the enormous collections of notebooks already freely available.
If you haven't tried python, I recommend learning it interactively via codecademy: https://www.codecademy.com/learn/python
If you like to learn from video examples, try this Youtube Tutorial: https://www.youtube.com/user/sentdex/playlists
2. Python Setup
We will use miniconda, a light-weight, command line-only version of the powerful Python distribution Anaconda. First, download the Python 3.5 version of the miniconda installer for your platform from here (http://conda.pydata.org/miniconda.html). I do not recommend using Windows for scientific computing, so if you have a PC, consider installing one of the many user-friendly flavors of Linux (Mint: https://www.linuxmint.com/download.php), either natively/dual-boot or in a virtual machine (Virtualbox: https://www.virtualbox.org/wiki/Downloads).
2.1. Installing Python
The following instructions are for Linux machine. Open a new terminal (or CTRL+ALT+T) and go to the directory of downloaded miniconda.
$ cd ~/Downloads/Miniconda2-latest-Linux-x86_64.sh
Tell Linux that is an executable file: $ chmod +x Miniconda2-latest-Linux-x86_64.sh and then install:
$ ./Miniconda2-latest-Linux-x86_64.sh
2.2. Creating your first Python environment
We will create an environment called py35 with python 3.5 installed.
$ conda create -n py35 python=3.5
Then, activate or enter into the environment:
$ source activate py35 Note that the handler of your terminal changes to:
(py35) $
where the name inside the brackets indicate the environment you are using. To see what is installed within the py35 environment:
(py35) $ conda list
And it outputs:
openssl: 1.0.2h-1
pip: 8.1.2-py27_0
python: 3.5.1-0
readline: 6.2-2
setuptools: 23.0.0-py27_0
sqlite: 3.13.0-0
tk: 8.5.18-0
wheel: 0.29.0-py27_0
zlib: 1.2.8-3
These libraries are needed to make python 3.5 work. Other library in this list such as pip is used for installing other library. Now, get out the py35 environment:
(py35) $ source deactivate
$
2.3 Creating More than One Environment
If you want to create a new environment called test and install an older version of python in it,
$ conda create -n test python=2.7
$ source activate test
(test) $
See if it has python 2.7 installed,
(test) $ python --version
Python 2.7.11 :: Continuum Analytics, Inc.
$ source deactivate
Great. Now, if you want to see the list of environments you have created,
$ conda env list
root home/usrname/miniconda
py35 home/usrname/miniconda/envs/py35
test home/usrname/miniconda/envs/test
If you want to remove test environment (because you already have the recent python 3.5),
$ conda remove env -n test --all
You see, environment functions like a folder where a copy of certain programs are located. So if you want to run a code using python 3 for example, it will only use the programs within that environment (i.e. py35). If you break something, it will only affect whatever is inside this environment. You can have another environment (i.e. test) with Python 2 for your code. Note that without environment capability provided by conda, it is difficult to have python 2 and 3 working together because they require different dependencies.
2.4. Installing Libraries Within the Environment
Now we will install libraries (e.g. astroml) and programs (e.g. jupyter, git) within py35 environment. Using pip, install astroml--a Machine Learning Astronomy package for python: $ source activate py35
(py35) $ pip install astroml
Now, launch ipython (ipython is the interactive version of python):
(py35) $ ipython
You have successfully installed ipython if you see your terminal becomes like this:
In [1] : To check the version of your python:
In [1] : !python --version
Python 3.5.1 :: Continuum Analytics, Inc.
You can check that the astroML library you downloaded works correctly if it shows no error.
In [2]: import astroml
In [3]:
Note that to run scripts within ipython, say my_program.py, you need to go to the directory where the script is located: In [3] : cd /home/Desktop/
In [4]: run my_program.py
To get out of the ipython:
In [5] exit
And you will see you're back to:
(py35) $
3. Jupyter Notebook
We will view and edit codes using Jupyter (http://jupyter.org/). Jupyter is perfect if you want to learn about Python, particular Python packages, or even about entire subjects, there exist an enormous number of freely available documents (https://github.com/ipython/ipython/wiki/A-gallery-of-interestingIPython-Notebooks), called Jupyter Notebooks, which can help you. There are some which contain good introductions to the basics of Python, some which contain advanced tutorials about complex data analysis methods and tools, and some which are actually entire books on a particular subject.
In all cases, Jupyter Notebooks are interactive – you read them and run the code contained in them right in your browser. They are also great for sharing a new idea with a collaborator, or even just for testing your own code and exploring your data.
(py35) $ pip install jupyter
You will now see a browser window open up which is rendering the Notebook. This browser window is connected to a running Python kernel in the background, so you can actually run the code you see, modify it, etc. This is what makes Jupyter Notebooks so powerful. To understand how to use Jupyter Notebooks, there is plenty of online documentation, but a good starting point is to read this (http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-0- Scientific-Computing-with-Python.ipynb) Download this sample notebook by clicking the down arrow in the upper right corner: https://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Runni ng%20Code.ipynb
Go to the directory where it is saved:
(py35) $ cd ~/Downloads/
(py35) $ ipython notebook Lecture-0-Scientific-Computing-with-Python.ipynb
A new tab of your default web browser will pop up. Now you can read the document and run some of the parts that contain a code. Those cells are marked with ipython marker In [1]. Click this cell and press SHIFT+ENTER and you will see an output in Out [1].
To close this, click FILE and SAVE & EXIT. If you just close the tab using the X button, the notebook will still be running. So force close it by clicking CTRL+C in the open terminal. Note that you cannot use the previous terminal while a jupyter notebook is running. To do this, you can add a new terminal by CTRL+ALT+T.
4. Git and GitHub
We’re almost done. Have you seen these?
thesis.doc thesis2.doc
thesis_final.doc
thesis_super_final2.doc
It is difficult to track the version of your files right? To have version control of your file, I introduce git (https://git-scm.com/). Now let us install git.
(py35) $ pip install git
If successful let us easily download a gold pot of ready-made codes written by the author of a book (http://www.astroml.org/index.html). To download the codes contained in the book:
(py35) $ git clone https://github.com/astroML/astroML.git
Basically, we are making of a copy available online (https://github.com/astroML/astroML) into your hard drive so that we can view and play with the codes. Take the time to learn how to use git– it will change your life. The documentation is here, and there are plenty of other great online resources to help you, including tutorials with terminal emulators running right in the browser to help get you started (i.e. https://try.github.io/levels/1/challenges/1).
Let us come back to astroml later. For now, get out of py35 environment:
(py35) $ source deactivate
$
Update everything in your system:
$ sudo apt-get update && sudo apt-get upgrade -y
If everything is working fine, then let’s start with astroML.
Activate the environment:
$ source activate py35
Go to the directory of downloaded astroml:
(py35) $ cd Downloads/astroML
Open the downloaded codes using notebook:
(py35) ~/Downloads/astroML $ ipython notebook
A new tab of your default browser will pop up. Here you can navigate through the folder directories by clicking: book_figures and then chapter1 To close this, click CTRL+C in the terminal. Note that you cannot use the previous terminal. Or you can add a new terminal by CTRL+ALT+T.
Now you’re off to go.