Top 5 amazingly powerful Python libraries for Data Science

Data Science   |   
Published May 18, 2015   |   

Have you decided to learn Python as your programming language? Then you should definitely know the different types of Python libraries to perform data analysis. In this article, we will see five amazingly powerful Python libraries for Data Science and best online tutorials to learn them.
Let’s get started!

Numpy

It is the foundation on which all higher level tools for scientific Python are built. Here are some of the functionalities it provides:

  • N- Dimensional array, a fast and memory efficient multidimensional array providing vectorized arithmetic operations.
  • You can apply standard mathematical operations on arrays of entire data without writing loops.
  • It is very easy to transfer data to external libraries written in a low-level language (such as C or C++), and also for external libraries to return data to Python as Numpy arrays.Linear algebra, Fourier transforms and random number generation

NumPy does not provide high-level data analysis functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools like Pandas much more effectively.
Tutorials

Scipy

The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines , such as routines for numerical integration and optimization. SciPy has modules for optimizationlinear algebraintegration and other common tasks in data science.
Tutorial
I couldn’t find any good tutorial other than Scipy.org. This is the best tutorial for learning Scipy.

Pandas

It contains high-level data structures and tools designed to make data analysis fast and easy. Pandas are built on top of NumPy, and makes it easy to use in NumPy-centric applications.

  • Data structures with labeled axes, supporting automatic or explicit data alignment. This prevents common errors resulting from misaligned data and working with differently-indexed data coming from different sources.
  • Using Pandas it is easier to handle missing data.
  • Merge other relational operations found in popular databases (SQLbased, for example)

Pandas is the best tool for doing data munging.
Tutorials

Matplotlib

Matlplotlib is a Python module for visualization. Matplotlib allows you to easily make line graphs, pie chart, histogram and other professional grade figures. Using Matplotlib you can customize every aspect of a figure. When used within IPython, Matplotlib has interactive features like zooming and panning. It supports different GUI back ends on all operating systems, and can also export graphics to common vector and graphics formats: PDF, SVG, JPG, PNG, BMP, GIF, etc.
Tutorials

  • Show me do has a good tutorial on Matplotlib
  • I also recommend the cook book from pack publishers. This is an amazing book for someone getting started in Matplotlib.

Scikit-learn

Scikit-learn is a Python module for Machine learning built on top of Scipy. It provides a set of common Machine learning algorithms to users through a consistent interface. Scikit-learn helps to quickly implement popular algorithms on your dataset. Have a look at the list of algorithims available in scikit-learn, and you can quickly realize that it includes tools for many standard machine-learning tasks (such as clustering, classification, regression, etc).
Tutorials

Conclusion

There are also other libraries such as Nltk (Natural language Tool kit), Scrappy for web scraping, Pattern for web mining, Theano for deep learning. But if you are getting started in python, I would recommend you to first get familiar with these 5 libraries. I have mentioned the tutorials that are beginner friendly, before going through these tutorials ensure that you are familiar with basics of python programming.