Hello, readers! In this article, we will be focusing on Python Pandas math functions, in detail.
So, let us begin!! 🙂
Table of Contents
Python Pandas module – Quick overview
Python offers us with various modules to deal and manipulate the data according to the requirements.
One such module is python pandas module.
Pandas module is one of the most efficient library for data analysis and modelling. It offers us with DataFrame and Series data structure to store and manipulate the data in the form of rows and columns. Further, it consists of various functions to clean and process the data for modeling.
On the similar lines, when it comes to data analysis and modelling, we come across the need to have mathematical functions to moderate the data.
Let us have a look at a list of some cool and easy math functions offered by Python Pandas module.
Pandas math functions
In the context of this topic, we will be focusing on the below mathematical functions offered by Pandas module–
- describe() function
- value_counts() function
- mean() and median() functions
- sum() function
- min() and max() functions
We will be making use of the below dataset in the upcoming examples.
Importing the above dataset into the Python environment::
import pandas as pd data = pd.read_csv("C:\\Users\\Downloads\\datasets_180_408_data.csv") # dataset
1. Pandas describe() function
With Python pandas describe() function, we can easily fetch the statistical information about the dataset with respect to the following parameters–
- value count
- minimum limit of data
- maximum limit of data
- inter-quartile range
- standard deviation, etc
Thus, if we wish to have a statistical analysis of data handy, describe() function is the best choice for us.
id radius_mean texture_mean perimeter_mean area_mean \ count 1.900000e+01 19.000000 19.000000 19.000000 19.000000 mean 4.049257e+07 16.081053 20.498947 106.725789 829.931579 std 4.293723e+07 2.942387 3.997922 19.297775 305.009648 min 8.423020e+05 11.420000 10.380000 77.580000 386.100000 25% 8.453085e+05 13.720000 18.935000 91.900000 578.100000 50% 8.490140e+05 15.850000 20.830000 103.600000 782.700000 75% 8.447960e+07 18.710000 22.925000 126.400000 1081.500000 max 8.486200e+07 20.570000 27.540000 135.100000 1326.000000 smoothness_mean compactness_mean concavity_mean count 19.000000 19.000000 19.000000 mean 0.107596 0.164038 0.158438 std 0.016457 0.068554 0.067645 min 0.082060 0.066690 0.032990 25% 0.097250 0.105850 0.099460 50% 0.109600 0.159900 0.163900 75% 0.118500 0.215750 0.202250 max 0.142500 0.283900 0.300100
2. The sum() function
As we all know, the Pandas module deals with data in the form of rows and columns, thus in order to get the total value of every column, sum() function can be used.
With sum() function, we can get the numerical summation of every data column present in the dataset.
id 769358823 diagnosis MMMMMBMMBMBBMBMMBBB radius_mean 305.54 texture_mean 389.48 perimeter_mean 2027.79 area_mean 15768.7 smoothness_mean 2.04432 compactness_mean 3.11673 concavity_mean 3.01032 dtype: object
3. Pandas mean() and median() function
One of the important statistical terms for analysis is mean and median.
So, with the Pandas module, we can make use of mean() and median() functions to get the mean as well as the median value of every individual data column easily.
id 4.049257e+07 radius_mean 1.608105e+01 texture_mean 2.049895e+01 perimeter_mean 1.067258e+02 area_mean 8.299316e+02 smoothness_mean 1.075958e-01 compactness_mean 1.640384e-01 concavity_mean 1.584379e-01 dtype: float64 id 849014.0000 radius_mean 15.8500 texture_mean 20.8300 perimeter_mean 103.6000 area_mean 782.7000 smoothness_mean 0.1096 compactness_mean 0.1599 concavity_mean 0.1639 dtype: float64
4. The min() and max() function
- Pandas min() function enables us to have the minimum value limit for every column of the dataset handy.
- With max() function, all the maximum value for every individual column is displayed.
id 842302 diagnosis B radius_mean 11.42 texture_mean 10.38 perimeter_mean 77.58 area_mean 386.1 smoothness_mean 0.08206 compactness_mean 0.06669 concavity_mean 0.03299 dtype: object id 84862001 diagnosis M radius_mean 20.57 texture_mean 27.54 perimeter_mean 135.1 area_mean 1326 smoothness_mean 0.1425 compactness_mean 0.2839 concavity_mean 0.3001 dtype: object
5. Pandas value_counts() function
In the domain of data science and analysis, when we deal with data variables, especially categorical type of variables, it is very necessary to understand the different trend in that type of variable.
That is, within every categorical variable, it is essential to identify the type of category/group within it.
With the value_counts() function, we can easily count the type of category or the frequency of the values within every variable/data column that we pass to the function.
In the below example, we have calculated the frequency of every data item of the column radius_mean, as shown below–
19.81 1 16.02 1 18.25 1 13.00 1 12.46 1 17.99 1 20.57 1 14.68 1 14.54 1 16.13 1 15.78 1 13.71 1 15.85 1 11.42 1 20.29 1 12.45 1 13.73 1 19.17 1 19.69 1
With this, we have come to the end of this topic. Feel free to comment below in case you come across any question.
For more such posts related to Python programming, Stay tuned with us!
Till then, Happy Learning!! 🙂