- setting up a command with several steps
- the command is slow, but where?
- adding –timestamp to spot the bottleneck
- adding –debug for the full picture
- going deeper with –profiling
I have a command that feels slow and I want to find out why.
setting up a command with several steps
Let’s say I have this command that simulates several steps. It logs a summary at info level and detailed steps at debug level.
import time
from clk.decorators import command
from clk.log import get_logger
LOGGER = get_logger(__name__)
@command()
def slowcmd():
"""A command with several steps."""
LOGGER.debug("starting step 1: fetch config")
LOGGER.debug("starting step 2: heavy computation")
time.sleep(3)
LOGGER.debug("starting step 3: write results")
LOGGER.info("done")
the command is slow, but where?
Running it, all I see is “done” — no clue about what took so long:
clk slowcmd 2>&1
done
adding –timestamp to spot the bottleneck
Adding --timestamp prefixes each log line with a timestamp. I can confirm it is slow, but there is only one log line — not enough to pinpoint the cause:
clk --timestamp slowcmd 2>&1
2024-02-14 23:00:06,000 done
adding –debug for the full picture
Adding --debug reveals the debug log lines. Combined with --timestamp, I can now see exactly where the time is spent:
clk --debug --timestamp slowcmd 2>&1
2024-02-14 23:00:06,000 debug: starting step 1: fetch config
2024-02-14 23:00:06,000 debug: starting step 2: heavy computation
2024-02-14 23:00:09,000 debug: starting step 3: write results
2024-02-14 23:00:09,000 done
2024-02-14 23:00:09,000 debug: command `clk/__main__.py --debug --timestamp slowcmd` run in 3 seconds
The three-second gap between step 2 and step 3 is immediately visible, pointing to the heavy computation as the bottleneck.
going deeper with –profiling
Now I know that step 2 is the bottleneck. To understand what exactly is happening inside, I can use --profiling to get function-level detail:
clk --profiling slowcmd 2>&1 | grep sleep
1 0.000 0.000 0.000 0.000 clk/core.py:0(_fake_sleep)
Here _fake_sleep appears because the test environment replaces time.sleep with a fake. In real life, the output would show {built-in method time.sleep} with 3 seconds of cumulative time, confirming where the bottleneck is.