Python Pandas math functions to know!

Filed Under: Pandas
Python Pandas Math Functions

Hello, readers! In this article, we will be focusing on Python Pandas math functions, in detail.

So, let us begin!! 馃檪


Python Pandas module – Quick overview

Python offers us with various modules to deal and manipulate the data according to the requirements.

One such module is python pandas module.

Pandas module is one of the most efficient library for data analysis and modelling. It offers us with DataFrame and Series data structure to store and manipulate the data in the form of rows and columns. Further, it consists of various functions to clean and process the data for modeling.

On the similar lines, when it comes to data analysis and modelling, we come across the need to have mathematical functions to moderate the data.

Let us have a look at a list of some cool and easy math functions offered by Python Pandas module.


Pandas math functions

In the context of this topic, we will be focusing on the below mathematical functions offered by Pandas module–

  • describe() function
  • value_counts() function
  • mean() and median() functions
  • sum() function
  • min() and max() functions

We will be making use of the below dataset in the upcoming examples.

Dataset
Dataset

Importing the above dataset into the Python environment::

import pandas as pd
data = pd.read_csv("C:\\Users\\Downloads\\datasets_180_408_data.csv") # dataset

1. Pandas describe() function

With Python pandas describe() function, we can easily fetch the statistical information about the dataset with respect to the following parameters–

  • value count
  • mean
  • median
  • minimum limit of data
  • maximum limit of data
  • inter-quartile range
  • standard deviation, etc

Thus, if we wish to have a statistical analysis of data handy, describe() function is the best choice for us.

Example–

print(data.describe())

Output–

                 id  radius_mean  texture_mean  perimeter_mean    area_mean  \
count  1.900000e+01    19.000000     19.000000       19.000000    19.000000   
mean   4.049257e+07    16.081053     20.498947      106.725789   829.931579   
std    4.293723e+07     2.942387      3.997922       19.297775   305.009648   
min    8.423020e+05    11.420000     10.380000       77.580000   386.100000   
25%    8.453085e+05    13.720000     18.935000       91.900000   578.100000   
50%    8.490140e+05    15.850000     20.830000      103.600000   782.700000   
75%    8.447960e+07    18.710000     22.925000      126.400000  1081.500000   
max    8.486200e+07    20.570000     27.540000      135.100000  1326.000000   

       smoothness_mean  compactness_mean  concavity_mean  
count        19.000000         19.000000       19.000000  
mean          0.107596          0.164038        0.158438  
std           0.016457          0.068554        0.067645  
min           0.082060          0.066690        0.032990  
25%           0.097250          0.105850        0.099460  
50%           0.109600          0.159900        0.163900  
75%           0.118500          0.215750        0.202250  
max           0.142500          0.283900        0.300100  

2. The sum() function

As we all know, the Pandas module deals with data in the form of rows and columns, thus in order to get the total value of every column, sum() function can be used.

With sum() function, we can get the numerical summation of every data column present in the dataset.

Example–

print(data.sum())

Output–

id                            769358823
diagnosis           MMMMMBMMBMBBMBMMBBB
radius_mean                      305.54
texture_mean                     389.48
perimeter_mean                  2027.79
area_mean                       15768.7
smoothness_mean                 2.04432
compactness_mean                3.11673
concavity_mean                  3.01032
dtype: object

3. Pandas mean() and median() function

One of the important statistical terms for analysis is mean and median.

So, with the Pandas module, we can make use of mean() and median() functions to get the mean as well as the median value of every individual data column easily.

Example–

print(data.mean())
print(data.median())

Output–

id                  4.049257e+07
radius_mean         1.608105e+01
texture_mean        2.049895e+01
perimeter_mean      1.067258e+02
area_mean           8.299316e+02
smoothness_mean     1.075958e-01
compactness_mean    1.640384e-01
concavity_mean      1.584379e-01
dtype: float64

id                  849014.0000
radius_mean             15.8500
texture_mean            20.8300
perimeter_mean         103.6000
area_mean              782.7000
smoothness_mean          0.1096
compactness_mean         0.1599
concavity_mean           0.1639
dtype: float64

4. The min() and max() function

  1. Pandas min() function enables us to have the minimum value limit for every column of the dataset handy.
  2. With max() function, all the maximum value for every individual column is displayed.

Example–

print(data.min())
print(data.max())

Output–

id                   842302
diagnosis                 B
radius_mean           11.42
texture_mean          10.38
perimeter_mean        77.58
area_mean             386.1
smoothness_mean     0.08206
compactness_mean    0.06669
concavity_mean      0.03299
dtype: object

id                  84862001
diagnosis                  M
radius_mean            20.57
texture_mean           27.54
perimeter_mean         135.1
area_mean               1326
smoothness_mean       0.1425
compactness_mean      0.2839
concavity_mean        0.3001
dtype: object

5. Pandas value_counts() function

In the domain of data science and analysis, when we deal with data variables, especially categorical type of variables, it is very necessary to understand the different trend in that type of variable.

That is, within every categorical variable, it is essential to identify the type of category/group within it.

With the value_counts() function, we can easily count the type of category or the frequency of the values within every variable/data column that we pass to the function.

Example–

In the below example, we have calculated the frequency of every data item of the column radius_mean, as shown below–

print(data.radius_mean.value_counts())

Output–

19.81    1
16.02    1
18.25    1
13.00    1
12.46    1
17.99    1
20.57    1
14.68    1
14.54    1
16.13    1
15.78    1
13.71    1
15.85    1
11.42    1
20.29    1
12.45    1
13.73    1
19.17    1
19.69    1

Conclusion

With this, we have come to the end of this topic. Feel free to comment below in case you come across any question.

For more such posts related to Python programming, Stay tuned with us!

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content