Data structures – Python Lists, Pandas Series and Numpy Arrays

Filed Under: Python
Python Lists Vs Pandas Series Vs Numpy Arrays A Brief Report

As a data scientist or analyst, you spend most of the time understanding, analyzing data. To get a good interpretation of your data or even for analyzing it, knowing data structures is paramount. Python has many data structures such as list, tuple, dictionary, set, and more.

Similarly, two main libraries of data analysis, Pandas and Numpy also support some data structures. Today, in this story, I will walk you through the Python list, Pandas series, and Numpy arrays. These are the building blocks, which will help you in many ways. 


More About Data Structures

  • A data structure is used to store the data in a system in an organized way so that working with it should be easy.
  • Note that data structure is not a programming language. They are a bunch of algorithms that can be used in any programming language to store or organize data.
  • The need for the data structure is, with this ever-growing world and technology, we are witnessing complex applications. So, data itself is growing every second. Here, we may face some issues such as speed, search and parallel working and retrieval which may slow down your system. So having your data in an organized way can take you over these issues.
  • There are 2 types of data structures are there. Primitive and Non-primitive. The primitive data structures operate directly as per the defined or machine instructions. But, non-primitive data structures are more complex and derived from the latter.
  • Some of the key operations on data structures are – Searching, sorting, insertion, deletion and updating.
  • The key advantages of them are – efficient, storage, reusability, time efficient and data manipulation.
data structure

Python Lists

There are 4 built-in data types in python. Those are Dictionaries, Tuples, Lists, and Sets. You can store different values of different data types in lists. It can be int, float, string… One more thing, a list can store another list in it. 

There are many methods that you can use while working with lists in python. Among them some of the important ones are, append, insert, delete, sort and copy.

It is not a good time to go deeper into lists. So, here I will be giving some examples which will make you get to know about lists and it’s operations.

Create a list

#list

demo_list = [1,4,2,5,8,6,9]
demo_list.remove(4)
[1, 2, 5, 8, 6, 9]
#append

demo_list = [1,4,2,5,8,6,9]
demo_list.append(4)
[1, 2, 5, 8, 6, 9, 10]

You can perform many list operations such as extend(), count(), sort() and more. Make sure you give it a try.


Numpy Arrays

Numpy is a robust library for computational operations in python. An array is a grid of values that includes values of the same data type. The rank of an array will be its dimension. You can perform many array actions such as slicing, indexing, and more.

Let’s see how a 1D and 2D look like and we can further perform some array actions on it.

#1D array

import numpy as np

demo_1D_array = np.array([11,22,33,44])
demo_1D_array
array([11, 22, 33, 44])
#2D array

demo_2D_array = np.array([[11,22,33,44],[55,66,77,88]])
demo_2D_array
array([[11, 22, 33, 44],
       [55, 66, 77, 88]])

Now, let’s sum up all the values present in the array.

#sum

demo_2D_array.sum()
396

Fine. Can we now generate random values using Numpy?

#random numbers

random_numbers = np.random.randint(0,5,50)
random_numbers
array([0, 3, 2, 2, 2, 3, 0, 1, 1, 1, 4, 4, 3, 0, 1, 4, 3, 2, 3, 1, 0, 0,
       3, 1, 0, 0, 3, 2, 2, 3, 2, 2, 0, 3, 4, 1, 1, 2, 4, 0, 3, 0, 4, 0,
       1, 0, 2, 4, 0, 0])

Perfect!


Pandas Series

Series is a core aspect of Pandas which can be defined using pd.series(). It is a labeled array that can contain multiple data types.

You can combine one or more series and it will become a data frame. Let’s create a simple data frame using the pandas series function.

#series

import pandas as pd
student = ['Jhon','Gracy','Spidy','Reko']
marks = [87,90,81,94]

#dataframe 

df = pd.Series(marks, index = student)
Jhon     87
Gracy    90
Spidy    81
Reko     94
dtype: int64

Looks good.

You may be now wondering about the title of this article. Yes, I have defined the lists, arrays, and series to show you how they differ.


Storage

Yes. The key difference between them is storage. I will show you, if we can store some numbers on all these 3 data structures, they occupy significant spaces.

#storage 

import sys

print(f"Lists:{sys.getsizeof(lists)} bytes")
print(f"Arrays:{sys.getsizeof(arrays)} bytes")
print(f"Series:{sys.getsizeof(series)} bytes")
Lists:136 bytes
Arrays:136 bytes
Series:184 bytes

We have to import sys to get the storage size of these data structures. Now, observe the storage access by these.


Wrapping Up

Data structures are the most important aspect that you should be familiar of you are working with data. In this article, I have shown three different data structures and the memory required for them. I hope it was a short but informative thing on the data structures.

That’s all for now. Happy Python!!!

More read: More articles on data structures and algorithms

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content