Introduction to Continuous Profiling for Python

Filed Under: Python

Profiling in computer programming is a form of dynamic code analysis. While an application is running, you can use a program or tool to collect the running characteristics of the application. We collect this information in the form of metrics. These metrics are analyzed to uncover bottlenecks and performance issues that arise during runtime. They can also be used to see how we can optimize the application to run faster. 

Continuous profiling takes things a step further; it is performing profiling while an application executes in the production environment. By effectively performing continuous profiling on the application, we can discover bugs and optimize the application in production itself, saving long-term costs of resources.

Profiling in Python

Let’s briefly discuss the various profiling options available in Python.

1. Profiling in the development environment

Profiling in the development environment is achieved using Python packages like cProfile and line_profiler.

1.1  Profiling using cProfile:

Developers can use this module to either profile the whole program or a certain section of the code by embedding it in the code.

1.1.1 Whole program

cProfiler can be run on the whole program.

The following command can be used to create the output shown in the screenshot below.

python -m cProfile -s tottime


The tabular format contains the various columns, each denoting a different metric: 

ncalls: number of calls

tottime: total time to complete execution

percall: time taken per call

cumtime: cumulative time including all the steps that have been executed 

percall: cumulative time per call.

1.1.2 Target profiling

Target profiling is targeting a part of the application and profiling it.


The metrics in this image are similar to what we have discussed in whole program profiling earlier.

1.2 Profiling using line_profiler:

This package needs to be installed explicitly. Once installed, use the Python @profile decorator to analyze the run statistics.

Install the package using the pip command and once installed, use the following command to run the profiler once the function is decorated with the @python decorator.

kernprof -l -v

Where the  -l argument specifies line-by-line profiling, and -v immediately visualizes the results.


2. Profiling in the production environment/continuous profiling

Profiling in the production environment is slightly different from profiling in development because of some obvious constraints: the inability to modify the running code too frequently and the testing and integration involved before the code is merged. Also, the production workloads may have an uneven distribution of load throughout a day or week, which is quite common in real-world scenarios.

So, there are certain profiling tools available in the market that can be integrated with specified workloads. They continuously gather statistics about running code and visualize them in the form of flame graphs or tables. You can use DataDog, Google Cloud Profiler, or AWS Cloud Guru for this purpose. Later in this article, we’ll be looking at a demonstration using gProfiler, which is an open source continuous profiler.

Before we discuss a tool that we can use to do continuous profiling, we have to understand what all metrics can be recorded.

2.1 Metrics recorded in continuous profiling 

2.1.1 CPU time

This metric is the CPU time taken by a specific block of code. It only includes the execution time involved and no waiting time if there was any waiting period for the CPU resources.

2.1.2 Wall-clock time

This metric shows the overall time taken by a block of code to execute. It involves the waiting time for the resource as well as the execution time.

2.1.3 Heap usage and allocation

This metric tells us about the memory capacity used by the running application at the time when the profiler took the snapshot.

The heap allocation metrics talk about the overall memory allocated to the program heap.  Comparing both the metrics, we can identify memory-intensive areas and memory leaks within the program. We can also learn which allocation areas are causing garbage collectors to work more.

2.1.4 Threading

This metric tells us about thread leaks (increase in the number of threads) and thread instances that were created but never ran.

2.1.5 Contention

This metric is specific to shared region access time in the case of multi-threaded programs. The waiting time for the common region of code access/resource is significant in cases of multi-threaded programs.

2.2 Continuous Profiling using GProfiler

As mentioned earlier, in this section, I will take you through a demonstration of gProfiler. This tool is an open source continuous profiler which can be deployed to any production environment and starts profiling the application code without reducing the application’s performance.

The main advantage of using gProfiler is the plug and play support—it gathers insights about the application in various programming languages without the need for code changes. In addition, the performance overhead of running the profiler is minimal because it runs in the background. It also reduces your application’s CPU usage and cloud computation costs, optimizing the running cost of your application. All this makes it much easier for you to improve the performance of your application.

It hardly took me 10 minutes to start using gProfiler on my application. If you also want to give it a try, here is how you can do so:

1. You can register yourself for evaluation copy by registering on this website



2. Once you have registered yourself, you would get a confirmation and activation email from the team.

3. Once the above steps are completed, you can head over to the link and log in with your registered credentials.

The portal will look like this.



  1. You have to click the Install Service button at the lower-left corner in the menu bar to install the service. You will have four options for installing it: using Docker, Daemonset, Command-Line, or Databricks. Make sure you remember the entered service name and click the submit button to generate the plug and play commands that need to be plugged into your service.


  1. Install the service using the plug and play commands generated, and the profiler will start working.
  2. Wait for a couple of minutes and then head over to the view option and select the service from the dropdown menu. Observe the flame graph that depicting the current state of profiling.


  1. You can click on each process and see the CPU % utilization and the number of samples for which the profiling ran. You can also download the flame graph chart or share the workspace with others if needed.


Once you have the statistics about which process consumes more time and CPU resources, you can debug your code using the profiling methods discussed earlier to reduce the bottlenecks. That is how you continuously monitor profiling data and improve the bottleneck areas within your application code.


The software development life cycle is a continuous iterative process. So is the case with improving code performance. No matter how well-written or optimized the code is, there are always unknown or grey areas around how it will behave in the production environment. You can reduce the technical debt around performance, but the actual running metrics will tell you the real story. Once you have insights about what you can improve by looking at the actual production run, you can improve the overall performance of your application. 

Generic selectors
Exact matches only
Search in title
Search in content