Integrating Bodo with Front-End Tools¶
Bodo can be integrated with front-end tools to build real-time analytics dashboards. This page provides a walk-through of creating a Streamlit app with Bodo on your laptop or VM.
All the code referenced on this page is available here, and the steps to running the app are provided below.
The Taxi Pickup App¶
This app is based on a demo from the official Streamlit documentation, which explores a public Uber dataset for pickups and drop-offs in New York City.
We will essentially read a parquet file into a dataframe,
convert the string
date/time column to datetime data,
and return the dataframe to be plotted in the app:
def load_data_pandas(pq_file_path, date_col='date/time'): data = pd.read_parquet(pq_file_path) data[date_col] = pd.to_datetime(data[date_col]) return data
Bodo version of the Taxi Pickup App¶
To run the app using Bodo, we will use the same process as running the
app on an IPyParallel cluster. For this app, we want to visualize all the data, so in the
Bodo version of this function, we disable automatic data distribution
returns_maybe_distributed flag, and use
gather all the data onto a single process:
@bodo.jit(returns_maybe_distributed=False, cache=True) def load_data_bodo(pq_file_path, date_col='date/time'): data = pd.read_parquet(pq_file_path) data[date_col] = pd.to_datetime(data[date_col]) return bodo.gatherv(data)
We define a Python wrapper for
def build_main(pq_file_path, date_col='date/time'): op_df = load_data_bodo(pq_file_path, date_col='Date/Time') return op_df
Finally, we need a function to send the imports and code definitions to
the mpi engines, call the
load_data_bodo function, and then return the
result to the client:
def initialize_bodo(pq_file_path, date_col='date/time'): t0 = time.time() client = ipp.Client(profile='mpi') dview = client[:] # import libraries dview.execute("import numpy as np") dview.execute("import pandas as pd") dview.execute("import bodo") dview.execute("import time") dview.execute("import os") dview.execute("import datetime as dt") dview.execute("import sys") bodo_funcs = [load_data_bodo] for f in bodo_funcs: # get source code of Bodo function f_src = inspect.getsource(f) # execute the source code thereby defining the function on engines dview.execute(f_src).get() op_df = dview.apply(build_main, pq_file_path, 'Date/Time').get() t1 = time.time() print("Total Exec + Compilation time:", t1-t0) client.close() return op_df
Building the Streamlit Visualization¶
We create the Streamlit App by adding the title, creating some headers and printing out some basic information about our app:
st.title('Scale up your datasets and make Pandas fly with Bodo!') st.subheader('Based on Streamlit example for Uber pickups in NYC') st.subheader(' - > Basic Info') st.subheader('Number of physical cores/ranks available on system: %s' % psutil.cpu_count(logical=False))
We first run the Pandas app and see how long it takes:
t0 = time.time() pdf = load_data_pandas(pq_file_path, date_col='Date/Time') t1 = time.time() st.subheader('Pandas df') st.subheader('Time taken for one op with Pandas:') st.subheader(t1-t0) st.write(pdf.head(2)) # print two rows to check output.
We do the same with Bodo:
t2 = time.time() bdf = initialize_bodo(pq_file_path, date_col='Date/Time') t3 = time.time() st.subheader('Bodo df') st.subheader('Total Compilation and Execution time taken for one op with Bodo:') st.subheader(t3-t2) st.write(bdf.head(2))
We can also visualize the data in a histogram showing the pickups by hour:
DATE_COLUMN = 'date/time' lowercase = lambda x: str(x).lower() bdf.rename(lowercase, axis='columns', inplace=True) st.subheader('Number of pickups by hour') hist_values = np.histogram(bdf[DATE_COLUMN].dt.hour, bins=24, range=(0,24)) st.bar_chart(hist_values)
Running the Taxi Pickup App¶
Clone the Bodo Examples
repository and navigate to the
streamlit directory. The
directory has the following structure:
streamlit ├── README.md ├── app.py ├── config.py ├── environment.yml ├── pd_vs_Bodo.png ├── sample_parquet_file.pq
We have provided an
environment.yml file to create a conda environment
with all the required dependencies. The app code is stored in
and some configuration parameters such as the input file, and path to
current directory are set in
config.py. We have provided a sample
sample_parquet_file.pq to test the app with.
Please ensure that the path to current directory is set in the
Start the IPyParallel controller and engines¶
Create a conda environment from the provided
environment.yml file, and
activate the conda environment:
Append the current directory to your Python Path:
stlbodoconda environment. You will need to append the current directory to your Python Path again. Use the following command to start a set of MPI engines:
mpiexec -n 4 python -m ipyparallel.engine --mpi --profile-dir ~/.ipython/profile_mpi --cluster-id '' --log-level=DEBUG
Run the Streamlit App¶
Open another terminal and activate the
stlbodo conda environment.
Navigate to the
streamlit directory, and then run:
You should now be able to open up the app in a browser window and see the output for yourself. Note that it will take roughly around one and a half minute for the Pandas output to show up, and including compilation time, and following that, less than a minute for for the Bodo output and visualization to show up.
If you face any issues while running the app, please let us know through our Feedback repository, or join our community slack to communicate directly with Bodo engineers.