Blog

Installing Anaconda to Setup a Machine Learning Environment

I am trying to set up a machine learning environment on my laptop. For this purpose, I am primarily a Python user. So I will need all the Python libraries. I will also setup things in an python environment. So that I can keep my workplace neat. Creating a pyhton environnment is necessary specially if different project has different version requirements. Specially if we want to work on some legacy project, which I plan to work on. I am running Ubuntu 18.04 LTS. I am documenting for my own reference, so that I know what I have done previously.

Step 1 – Download

I start with googling and going to the anaconda page. I click on the linux under the regular installation thing. Aparently, as shown in the page, we have two options for installing Anaconda miniconda (needs around 400MB disk space) and full anaconda (needs around 3GB disk space). Anaconda has all the necessary packages, while miniconda will have only the basic thing installed, so that we can install everything later. We will not have a lots of unnecessary things installed. I decided to install miniconda, so I go to this link for downloading miniconda. As shown in the following figure, I download the 64-bit (bash installer) on my Desktop for Python 3.6 as shown below.

Step 2 – Install

Now we need to install miniconda. Lets open a terminal and change directory to Desktop (or the location of the file)

shant@shanto:~$ cd ~/Desktop/
shant@shanto:~/Desktop$ ls
Miniconda3-latest-Linux-x86_64.sh
shant@shanto:~/Desktop$ clear

shant@shanto:~/Desktop$ bash Miniconda3-latest-Linux-x86_64.sh 

Welcome to Miniconda3 4.5.4

In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>> 

Pressing ENTER will let the installation process continue as shown below. I will continue with all the default options as shown highlighed in the following.

===================================
Miniconda End User License Agreement
===================================

Copyright 2015, Anaconda, Inc.

All rights reserved under the 3-clause BSD License:
... ...
... ...
for client/server applications by using secret-key cryptography.

cryptography
    A Python library which exposes cryptographic recipes and primitives.


Do you accept the license terms? [yes|no]
>>> yes
Miniconda3 will now be installed into this location:
/home/shant/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/shant/miniconda3] >>> 
PREFIX=/home/shant/miniconda3
installing: python-3.6.5-hc3d631a_2 ...
Python 3.6.5 :: Anaconda, Inc.
installing: ca-certificates-2018.03.07-0 ...
... ...
... ...
installing: requests-2.18.4-py36he2e5f8d_1 ...
installing: conda-4.5.4-py36_0 ...
installation finished.
Do you wish the installer to prepend the Miniconda3 install location
to PATH in your /home/shant/.bashrc ? [yes|no]
[no] >>> yes

Appending source /home/shant/miniconda3/bin/activate to /home/shant/.bashrc
A backup will be made to: /home/shant/.bashrc-miniconda3.bak


For this change to become active, you have to open a new terminal.

Thank you for installing Miniconda3!
shant@shanto:~/Desktop$ 

Miniconda should already be installed. We can check the installation by typying any conda command as shown below. We then update conda to the latest version.

shant@shanto:~$ conda --version
conda 4.5.4
shant@shanto:~$ conda update conda
Solving environment: done

## Package Plan ##

  environment location: /home/shant/miniconda3

  added / updated specs: 
    - conda


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ... ...
    ------------------------------------------------------------
                                           Total:         4.6 MB

The following packages will be UPDATED:

    ... ...

Proceed ([y]/n)? y


Downloading and Extracting Packages
conda-4.5.10         |  1.0 MB | ############################################################################################### | 100% 
openssl-1.0.2p       |  3.5 MB | ############################################################################################### | 100% 
certifi-2018.8.13    |  138 KB | ############################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
shant@shanto:~$ 

Step 3 – Setting up Environment

Now we can create and manage environments using Anaconda. It is always better to use different environment for different purpose and install necessary package in a particular environment, so that each environment is orthogonal and does not interfere with each other. We can see the existing environments using the command conda info –envs. As we have just installed Anaconda there should just be only one ‘base’ environment, and we can activate the ‘base’ environment using source activate base, as shown below (the activated environment will be shown within a first bracket). In the activated environment we can type which python, to check that Python is running from our miniconda installation.

shant@shanto:~$ conda info --envs
# conda environments:
#
base                  *  /home/shant/miniconda3

shant@shanto:~$ source activate base
(base) shant@shanto:~$ which python
/home/shant/miniconda3/bin/python
(base) shant@shanto:~$ 

Now we can add a new environmet. First lets deactivate the current environment ‘base’ by typing source deactivate. While creating a new environment using conda, we can proceed with the original version of our Python (3.6 in this case) installation or we can specify any other version of Python (say 3.5 or 2.7). We will create two different environments for python 3.6 and python 2.7 namely ml36 and ml27 respectively. The primary purpose of these environments are doing machine learning. After installation we can check if they are really installed using the command for getting the environment names. Then we can switch between different environments and check if we have the right version of Python. The commands for all these tasks are highlighted.

(base) shant@shanto:~$ source deactivate
shant@shanto:~$ conda create --name ml36 python=3.6
Solving environment: done

## Package Plan ##

... ...
... ...
# To deactivate an active environment, use
#
#     $ conda deactivate

shant@shanto:~$ conda create --name ml27 python=2.7
Solving environment: done

## Package Plan ##

  environment location: /home/shant/miniconda3/envs/ml27

  added / updated specs: 
    - python=2.7

... ...
... ...
shant@shanto:~$ conda info --envs
# conda environments:
#
base                  *  /home/shant/miniconda3
ml27                     /home/shant/miniconda3/envs/ml27
ml36                     /home/shant/miniconda3/envs/ml36

shant@shanto:~$ source activate ml36
(ml36) shant@shanto:~$ python --version
Python 3.6.6 :: Anaconda, Inc.
(ml36) shant@shanto:~$ source deactivate
shant@shanto:~$ source activate ml27
(ml27) shant@shanto:~$ python --version
Python 2.7.15 :: Anaconda, Inc.
(ml27) shant@shanto:~$  

Step 4 – Installing Packages

Now we can install the necessary packages into any environment depending on our need. Lets activate ml36 and install some necessary packages for machine learning. First we will install Jupyter notebook, which is an essential tool for interactive data analysis and experimentation.

The command conda list will provide a list of the already installed packages. After installing Jupyter notebook, we will install pandas, spyder, numpy, scikit-learn, tensorflow, keras, pyyaml, h5pymatplotlibseaborn, argparse, pytorch,  keras.

shant@shanto:~$ source activate ml36
(ml36) shant@shanto:~$ conda list
# packages in environment at /home/shant/miniconda3/envs/ml36:
#
# Name                    Version                   Build  Channel
ca-certificates           2018.03.07                    0  
certifi                   2018.8.13                py36_0  
... ...
... ...

xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               ha838bed_2  
(ml36) shant@shanto:~$ conda install jupyter
Solving environment: done

## Package Plan ##

  environment location: /home/shant/miniconda3/envs/ml36
... ...
... ...

Verifying transaction: done
Executing transaction: done
(ml36) shant@shanto:~$ jupyter-notebook
[I 02:27:50.100 NotebookApp] Writing notebook server cookie secret to /run/user/1002/jupyter/notebook_cookie_secret
[I 02:27:50.332 NotebookApp] Serving notebooks from local directory: /home/shant
[I 02:27:50.332 NotebookApp] The Jupyter Notebook is running at:

At this moment Jupyter notebook installation is done and it should open in a tab on our default browser after the command jupyter-notebook as shown in the following Figure.

However, if we click on the down arrow on the New button on the top right corner, we do not see the new environment (ml36) in here. To solve this issue we need to instal nb_conda using the command conda install nb_conda on the terminal. Now if we open Jupyter by typing jupyter-notebook on the terminal we see that we have our environment listed in Jupyter notebook as shown below.

Now lets install the packages using the following commands on the terminal while the conda environment is active.

(ml36) shant@shanto:~$ conda install scipy
... ...
(ml36) shant@shanto:~$ conda install pandas
... ...
(ml36) shant@shanto:~$ conda install spyder
... ...
(ml36) shant@shanto:~$ conda install -c conda-forge tensorflow
... ...
(ml36) shant@shanto:~$ conda install -c conda-forge keras
... ...
(ml36) shant@shanto:~$ pip install matplotlib seaborn argparse
... ...
(ml36) shant@shanto:~$ conda install scikit-learn
... ...
(ml36) shant@shanto:~$ conda install -c anaconda xlrd
... ...
(ml36) shant@shanto:~$ conda install -c anaconda beautifulsoup4
... ...
(ml36) shant@shanto:~$ conda install -c bokeh bokeh
... ...
(ml36) shant@shanto:~$ conda install -c bokeh/label/dev bokeh
... ...
(ml36) shant@shanto:~$ conda install -c conda-forge ipywidgets
... ...
(ml36) shant@shanto:~$ conda install pytorch-cpu torchvision-cpu -c pytorch
... ...

I just went with all the default options while installing the packages. As shown in the gist below, all of them working perfectly. We need to remember that, depending on the environment (with or without GPU) we need to install the right version of PyTorch. As I am installing everything on my laptop which does not have a GPU, I selected a CPU version of PyTorch. The necessary command depending on the environment can be generated from the PyTorch official website.

Install Avro for Ubuntu 18.04 LTS

Avro is a phonetic keyboard for Unicode Bangla typing. For a windows machine Avro has its executable files and the installation process is pretty straight forward. However, the installation process for a Linux machine is not as simple (at least if compared to an Windows machine installation process). I am a Ubuntu user. As I am upgrading from 16.04 LTS to 18.04 LTS, I need to reinstall Avro, and seems like the installation process is a little different (as always). In this article, I am documenting the whole process for future reference (mostly for myself).

Step 1

From the search option in Ubuntu, we need to search for Language Support as shown in the following figure.


Continue reading “Install Avro for Ubuntu 18.04 LTS”

Web Scraping Using lxml

In this article I will demonstrate a few examples of web scraping. According to the Wikipedia article, Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. There are a lots of tutorial on web scraping. In this post I will demonstrate web scraping while solving a few problems. I will use Python3 and a few libraries for this purpose. Lets get into the problem.
Continue reading “Web Scraping Using lxml”

GPGPU Programming with CUDA for Color Space Conversion

General-purpose computing on graphics processing units (GPGPU, rarely GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU) [1].
Continue reading “GPGPU Programming with CUDA for Color Space Conversion”

What I Just Read : Assessing Cardiovascular Risk Factors with Computer Vision

One of my friend on Facebook, who happens to be a data scientist, shared a very exciting news. I do not click all the links that are shared by my friends on Facebook. But, this time I had to. The title was enough for any technology enthusiast to at least click for the details.

https://research.googleblog.com/
https://www.nature.com/articles/s41551-018-0195-0.pdf

Setting up Jupyter notebook with Tensorflow, Keras and Pytorch for Deep Learning

I was trying to set up my Jupyter notebook to work on some deep learning problem (some image classification on MNIST and imagenet dataset) on my laptop (Ubuntu 16.04 LTS). Previously I have used a little bit of Keras (which runs on top of Tensorflow) on a small dataset, but I did not use that with Jupyter. For that purpose I installed Tensorflow and Keras independently and used them in a Python script. However, it was not working from my Jupyter notebook. I googled for the solution, but found nothing concrete. I tried to activate the tensorflow environment and run jupyter notebook from their but in vein. I guess the reason is, I have downloaded different packages in different times and that might make some compatibility issues. Therefore, I decided to create a BRAND NEW conda environment for my deep learning endeavor. This is how it goes:
Continue reading “Setting up Jupyter notebook with Tensorflow, Keras and Pytorch for Deep Learning”

Using baseplot for Ploting Geographical Coordinates

Basemap is a great tool for creating maps using python in a simple way. It’s a matplotlib extension, so it has got all its features to create data visualizations, and adds the geographical projections and some datasets to be able to plot coast lines, countries, and so on directly from the library [1].
Continue reading “Using baseplot for Ploting Geographical Coordinates”

Copy File from Cloud HDFS to Local Computer

While I work with big data technologies like Spark and a large dataset I like to work on the university cloud, where everything is faster. However, for different reasons sometimes I have to move to local computer (my laptop). This time the reason is, I need to use a package of Python matplotlib, named baseplot, which is not installed on the cloud. However, the data I need to work on is on the cloud HDFS. Therefore, I need to copy the data from HDFS to my local laptop. This can be done in two simple steps:

Step 1: copy data from HDFS to remote local (not HDFS)
Step 2: copy data from remote local to local (my laptop)
Continue reading “Copy File from Cloud HDFS to Local Computer”

Data Science Interview Questions

In this post I am going to make a compilation of interview questions for data science role. A big part of them are questions that I faced during my interviews. I have also gathered questions from different websites and which I found interesting. So, lets get started.

What do you know about bias-variance/bias-variance tradeoff?

In statistics and machine learning, the bias–variance tradeoff (or dilemma) is the problem of simultaneously minimizing two sources of error that prevent supervised learning algorithms from generalizing beyond their training set [Wikipedia]:

  • The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). Bias are the simplifying assumptions made by a model to make the target function easier to learn. Examples of low-bias machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines. Examples of high-bias machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression [2].
  • The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting). Variance is the amount that the estimate of the target function will change if different training data was used. Low variance suggests small changes to the estimate of the target function with changes to the training dataset. High variance suggests large changes to the estimate of the target function with changes to the training dataset. Generally, nonparametric machine learning algorithms that have a lot of flexibility have a high variance. For example, decision trees have a high variance, that is even higher if the trees are not pruned before use. Examples of low-variance machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression. Examples of high-variance machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines [2].

Continue reading “Data Science Interview Questions”

Printing Jupyter Notebook to other File Format

As a data scientist, I frequently use Jupyter notebook. For writing some report one might need to print out (on paper) the full notebook. There is a print preview option in the current version of Jupyter notebook, but no print option.


I tried to use CTRL + P command on the print preview page, but the output was horrible (like when we try to print an webpage). I googled and found a better way of doing that.

I am running Jupyter notebook on Ubuntu 16.04. The steps are very simple:

(1) Open terminal
(2) Change directory (where the notebook is located)
(3) Use command: ipython nbconvert –to pdf A1.ipynb (A1.ipynb is my notebook)

shanto@shanto:~$ cd ~/Desktop/BigData/706/Assignments/
shanto@shanto:~/Desktop/BigData/706/Assignments$ ls
A1.ipynb
shanto@shanto:~/Desktop/BigData/706/Assignments$ jupyter nbconvert --to pdf A1.ipynb
[NbConvertApp] Converting notebook A1.ipynb to pdf
[NbConvertApp] Writing 25564 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] WARNING | bibtex had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 23494 bytes to A1.pdf
shanto@shanto:~/Desktop/BigData/706/Assignments$ 

The figure shows a snap of the generated *.pdf file. The file is reasonably neat with a good formating.

If we change the –to pdf part to –to whateverFormat then the same command can be used to convert the notebook to other formats. Conversion to a few other format is shown below.

shanto@shanto:~/Desktop/BigData/706/Assignments$ jupyter nbconvert --to script A1.ipynb
[NbConvertApp] Converting notebook A1.ipynb to script
[NbConvertApp] Writing 2077 bytes to A1.py
shanto@shanto:~/Desktop/BigData/706/Assignments$ # convert to latex
shanto@shanto:~/Desktop/BigData/706/Assignments$ jupyter nbconvert --to latex A1.ipynb
[NbConvertApp] Converting notebook A1.ipynb to latex
[NbConvertApp] Writing 25564 bytes to A1.tex
shanto@shanto:~/Desktop/BigData/706/Assignments$