When compiling functions, Bodo introduces various optimizations to improve
runtime performance. Since the success of certain optimizations can be essential,
we provide the option to run Bodo in
verbose mode and track certain optimizations
at compile time. The information provided by
verbose mode can help you better understand you workload's performance
as well as how to debug the workload. Additionally, using Python's
logging module alongside Bodo's
verbose mode can be used to track optimizations across frequently running jobs.
Currently all of our optimizations are tracked at compile time. This information is not stored if a function is cached.
To detect important optimizations, all you need to do is set a verbose level in the global scope of a Python file using
bodo.set_verbose_level(level). The verbose level
is a positive integer, with greater values outputting more detailed information. The optimizations that are
expected to be the most impactful are tracked at level 1, so in most situations you can just do
set_verbose_level(1). More information on the optimizations that are displayed is found in the
set_verbose_level API reference. Now when Bodo compiles a function,
rank 0 will log
important optimizations to
stderr using Python's
Below is an example using the
verbose mode to verify that Bodo is only loading the 1 column from a parquet file that is actually needed as opposed to any additional columns.
2022-03-24 11:50:21,656 - Bodo Default Logger - INFO - ================================================================================ ---------------------------------Column Pruning--------------------------------- Finish column pruning on read_parquet node: File "verbose_ex.py", line 8: def load_data(filename): df = pd.read_parquet(filename) ^ Columns loaded ['id'] ================================================================================
You can also log this information to a valid
logging.Logger instance with Bodo.
The logger should be a variable set in a global scope.
Leveraging Optimizations for Debugging¶
Optimzation logging can be useful for diagnosing possible performance issues. Below is an example that shows the impact of printing a section of a DataFrame to inspect the result of
2022-03-25 11:22:24,619 - Bodo Default Logger - INFO - ================================================================================ ---------------------------------Column Pruning--------------------------------- Finish column pruning on read_parquet node: File "verbose_ex.py", line 10: def load_data(filename): df = pd.read_parquet(filename) ^ Columns loaded ['id', 'Hectare', 'Date', 'Age', 'Primary Fur Color', 'Highlight Fur Color', 'Location', 'Specific Location', 'Running', 'Chasing', 'Climbing', 'Eating', 'Foraging', 'Other Activities', 'Kuks', 'Quaas', 'Moans', 'Tail flags', 'Tail twitches', 'Approaches', 'Indifferent', 'Runs from', 'Other Interactions']
df.head() prints every column in the DataFrame, so Bodo must load
all of the columns. In contrast, without this print, Bodo can load just a single
column from the parquet file, so this increases both memory usage and execution time.
In some situations the reason for an optimization failure may not be as straightforward as the example above. Even when the code is more complicated, the success/failure of optimizations can be an extremely useful first step to determine why performance is worse than expected.
Determines if compiled JIT functions should output logging information. Level 0 disables optimization logging and level 1 contains all of the most important operations.
The optimizations currently displayed at each level are:
Verbose Level Optimizations 1
- Column Pruning
- Filter Pushdown
- Dictionary Encoding
- Limit Pushdown
- BodoSQL generated IO time
- Join column pruning
- level: A non-negative integer for the logging granularity.
bodo.set_verbose_level()should not be used inside a JIT function.
Sets the logging location for Bodo verbose messages. Bodo will write to this logger on
rank 0only, to prevent possible conflicts when writing to an output file. All messages are given with
logging.info, so the logger should have an appropriate effect level.
logger: An instance of type
bodo.set_bodo_verbose_logger()should not be used inside a JIT function.