There is no doubt that Numpy and Pandas are an integral part of data science projects. Numpy, the numerical python is the most robust python library which has some detailed applications in image processing. Using these Numpy arrays, we can perform many operations. This library offers many functions which can be used as per requirements. In this tutorial, we will be discussing the top 10 numpy array functionalities in data science.
What is Numpy?
- To work with arrays, python offers this robust library called Numpy. It also offers many functions to deal with mathematical stuff such as algebra and Fourier transform.
- Travis Oliphant in 2005 has created this library. It is a open-source library which is free to use for all. It stands for Numerical python.
- Numpy is mainly brought to deal with the slowness of the lists which also work with arrays. It is proved that Numpy is around 50x faster than the traditional python lists.
- The array in the numpy is called as ndarray. Numpy offers multiple function to work with those ndarrays.
- The numpy works on the concept of the ‘locality of reference’. It means, numpy stores the arrays in a continuous place in the memory. But, the python list works on different principles which make it nothing but slow.
- This library is crafted to work with modern CPU architectures, which makes it even faster to work with arrays operations.
- Finally, Numpy is built on C or C++ language.
1. Array Creation
The first and foremost thing in the process of mastering numpy arrays is by creating an array. Yes, you heard it right. First, you need to learn how to create an array.
Numpy offers two similar functions which help in creating an array. Here, we will be using
#creating an array import numpy as np test = [11,22,33,44,55] x = np.array(test) x
array([11, 22, 33, 44, 55])
There is another function named as
#Creating an array import numpy as np test = [11,22,33,44,55] x = np.asarray(test) x
array([11, 22, 33, 44, 55])
As usual, it will also produce the same output as above. So basically you can use both
np.asarray() functions to create an array.
2. Array Shape
You have to know the shape of an array before performing any operation on it. You can find the shape of an array using the function
The concept of N-dimensional array is very important as these arrays can store the data of the same time and size within them.
You can use these N-D arrays to perform any mathematical operations.
#Shape test = np.array([[1,2,3,4,5],[8,9,10,11,12]]) test.shape
As shown above, using the .shape() function, you can easily find the shape of the array.
Indexing is one of the most important concepts when working with data. If you are familiar with indexing values in a list, then you will find this easy.
In the same way, you can index an array as well. Indexing helps to extract the required data. It has its own application in terms of data processing and analysis.
Just to mention, the indexing will start from 0. The first element in an array is 0 followed by subsequent numbers. You can call the index of the array using square brackets.
#indexing test = np.array([[1,2,3,4,5],[8,9,10,11,12]]) print('The first numbers in each array are = ', test,"and", test)
The first numbers in each array are = 1 and 8
Note that when you are working with multi-dimensional arrays, first you have to mention the array number followed by the index number.
Things may be a little fussy when it comes to indexing over data slicing. Slicing is a technique where you can retrieve a range of values from the arrays.
Let’s understand this with an example.
#slicing test = np.array([[1,2,3,4,5],[8,9,10,11,12]]) test[0:1]
array([[1, 2, 3, 4, 5]])
#slicing test = np.array([[1,2,3,4,5],[8,9,10,11,12]]) test[0:2]
array([[ 1, 2, 3, 4, 5], [ 8, 9, 10, 11, 12]])
#slicing test = np.array([[1,2,3,4,5],[8,9,10,11,12]]) test[1:2]
array([[ 8, 9, 10, 11, 12]])
In the process of slicing an array, if you observe the first example, it will be the last index number minus 1 (n-1). You can try practicing with more examples and I am sure you will get this on one or another example.
Multiplication is one of the most used mathematical or arithmetic operations on arrays. Here, let’s try to multiple two different arrays.
#multiplication x = np.array([11,22,33]) y = np.array([,,]) z = np.matmul(x,y) z
For this, the numpy library offers a function named
matmul(), using which you can multiple 2 arrays as shown above.
The mean of the values is the sum of all the values divided by the total number of values. Numpy offers a function named
np.mean() to find the mean of the array values.
#mean x = np.array([12,34,23,45,54,32,12,34,90,87,65]) y = round(np.mean(x),2) y
Here, I have used the round function to limit the decimal values to 2 positions.
7. Array Flattening
Yes, array flattening is a process where you will reduce the N-dimensional array to a single entity. This is useful when working with array values.
Numpy offers function names
np.ndarray.flatten() to ease this process. Let’s understand this with an example.
#flattening test = np.array([[1,2,3,4,5],[8,9,10,11,12]]) flattened = np.ndarray.flatten(test) flattened
array([ 1, 2, 3, 4, 5, 8, 9, 10, 11, 12])
You can observe that the N-dimensional array has become a 1-D array now. This is a very handy function when working with N-D arrays.
Be sure that you are not confused over arange and sorting. The arange function is used to create an array where you will specify the starting and ending numbers followed by the interval.
The below example shows the same.
#arrange x = np.arange(10,100,5) x
array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95])
As you can see we have passed the starting number as 10 and the ending number as 100. Our interval is 5. The np.arange function will return the array based on these numbers or inputs.
Now, the sorting of the values in arrays means, they will be sorted in a particular order. Usually, all the values in an array will be scattered and sometimes we need to sort the values.
#sorting x = np.array([12,34,54,23,45,66,87,43,56,32,10,45]) np.sort(x)
array([10, 12, 23, 32, 34, 43, 45, 45, 54, 56, 66, 87])
You can see that all the values are sorted in ascending order. This is a very handy function when working with arrays.
10. Random Values
You may have used the random function in python to generate the random numbers. Similarly in the arrays, you can use a function called
np.random.rand() to generate random array values.
#Random values np.random.rand(1,5)
array([[0.1183276 , 0.211124 , 0.52514465, 0.02092656, 0.79477222]])
That’s awesome, you are gradually excelling working with Numpy arrays. Note that the above function has generated 5 values within the width of 1.
Wrapping Up – Numpy Arrays
The numpy array and its functionalities are very useful when we are working with arrays in a data science project. Numpy offers plenty of functions that are useful to perform the above-shown operations effectively. It is not just we can it a robust library, it earned it. I hope by this you get to know some of the important numpy array operations.
That’s all for now. Happy Python!!!
More read: Working with Numpy arrays.