Verbose Mode¶
When compiling functions, Bodo introduces various optimizations to improve
runtime performance. Since the success of certain optimizations can be essential,
we provide the option to run Bodo in verbose mode
and track certain optimizations
at compile time. The information provided by verbose mode
can help you better understand you workload's performance
as well as how to debug the workload. Additionally, using Python's logging
module alongside Bodo's verbose mode
can be used to track optimizations across frequently running jobs.
Note
Currently all of our optimizations are tracked at compile time. This information is not stored if a function is cached.
Example Usage¶
To detect important optimizations, all you need to do is set a verbose level in the global scope of a Python file using bodo.set_verbose_level(level)
. The verbose level
is a positive integer, with greater values outputting more detailed information. The optimizations that are
expected to be the most impactful are tracked at level 1, so in most situations you can just do bodo.
set_verbose_level(1)
. More information on the optimizations that are displayed is found in the
set_verbose_level
API reference. Now when Bodo compiles a function, rank 0
will log
important optimizations to stderr
using Python's logging
package.
Below is an example using the verbose mode
to verify that Bodo is only loading the 1 column from a parquet file that is actually needed as opposed to any additional columns.
bodo.set_verbose_level(1)
@bodo.jit
def load_data(filename):
df = pd.read_parquet(filename)
return df.id
load_data("my_file.pq")
2024-03-24 10:44:21,023 - Bodo Default Logger - INFO -
================================================================================
--------------------------------Filter Pushdown---------------------------------
Arrow filters pushed down:
None
None
================================================================================
2024-03-24 10:44:21,023 - Bodo Default Logger - INFO -
================================================================================
---------------------------------Column Pruning---------------------------------
Finish column pruning on read_parquet node:
File "objmode.py", line 10:
def load_data(filename):
df = pd.read_parquet(filename)
^
Columns loaded ['id']
================================================================================
You can also log this information to a valid logging.Logger
instance with Bodo.
Important
The logger should be a variable set in a global scope.
bodo.set_verbose_level(1)
logger = logging.getLogger("myLogger")
logger.setLevel(logging.INFO)
logger.addHandler(logging.FileHandler("example.log"))
bodo.set_bodo_verbose_logger(logger)
@bodo.jit
def load_data(filename):
df = pd.read_parquet(filename)
return df.id
load_data("my_file.pq")
The output will be written to example.log
in the current working directory.
## Leveraging Optimizations for Debugging
Optimzation logging can be useful for diagnosing possible performance issues. Below is an example that shows the impact of printing a section of a DataFrame to inspect the result of `read_parquet`.
```py
bodo.set_verbose_level(1)
@bodo.jit
def load_data(filename):
df = pd.read_parquet(filename)
print(df.head(10))
return df.id
load_data("my_file.pq")
2024-03-24 10:48:19,046 - Bodo Default Logger - INFO -
================================================================================
--------------------------------Filter Pushdown---------------------------------
Arrow filters pushed down:
None
None
================================================================================
2024-03-24 10:48:19,047 - Bodo Default Logger - INFO -
================================================================================
---------------------------------Column Pruning---------------------------------
Finish column pruning on read_parquet node:
File "objmode.py", line 11:
def load_data(filename):
df = pd.read_parquet(filename)
^
Columns loaded ['id', 'Hectare', 'Date', 'Age', 'Primary Fur Color', 'Highlight Fur Color', 'Location', 'Specific Location', 'Running', 'Chasing', 'Climbing', 'Eating', 'Foraging', 'Other Activities', 'Kuks', 'Quaas', 'Moans', 'Tail flags', 'Tail twitches', 'Approaches', 'Indifferent', 'Runs from', 'Other Interactions']
Printing df.head()
prints every column in the DataFrame, so Bodo must load
all of the columns. In contrast, without this print, Bodo can load just a single
column from the parquet file, so this increases both memory usage and execution time.
In some situations the reason for an optimization failure may not be as straightforward as the example above. Even when the code is more complicated, the success/failure of optimizations can be an extremely useful first step to determine why performance is worse than expected.
User APIs¶
set_verbose_level¶
-
bodo. set_verbose_level (level)Determines if compiled JIT functions should output logging information. Level 0 disables optimization logging and level 1 contains all of the most important operations.
The optimizations currently displayed at each level are:
Verbose Level Optimizations 1 - Column Pruning
- Filter Pushdown
- Dictionary Encoding
- Limit Pushdown
- BodoSQL generated IO time
2 - Join column pruning
Arguments
- level: A non-negative integer for the logging granularity.
Note:
bodo.set_verbose_level()
should not be used inside a JIT function.
set_bodo_verbose_logger¶
-
bodo. set_bodo_verbose_logger (logger)Sets the logging location for Bodo verbose messages. Bodo will write to this logger on
rank 0
only, to prevent possible conflicts when writing to an output file. All messages are given withlogging.info
, so the logger should have an appropriate effect level.Arguments
logger
: An instance of typelogging.Logger
.
Note:
bodo.set_bodo_verbose_logger()
should not be used inside a JIT function.