Native SQL with Database Catalogs¶

Database Catalogs are configuration objects that grant BodoSQL access to load tables from a remote database. Bodo platform now supports adding Database catalogs through the UI and provides users the option to write native SQL code to run on the tables in the connected remote database.

Adding a Database Catalog¶

Supported On AWS · Supported on Azure

In your workspaces view, navigate to the Catalogs section in the sidebar. Click on CREATE CATALOG and fill up the form with the required values.

Catalogs

Currently, we support Snowflake Database Catalogs, Tabular Database Catalogs and AWS Glue Catalogs on the Bodo Platform.
See SnowflakeCatalog, TabularCatalog and GlueCatalog for details on the required parameters.

Upon submitting the form, you will see that your Catalog has been created and is now available to use in your interactive notebook.

Catalog List

Running a Job With a Database Catalog¶

Supported On AWS · Supported on Azure

To run a SQL job with the database catalog you need to create a job template in the jobs tab.

Job List

Configure the job template normally and under Advanced Options, you can select the Catalog you want to use.

Job Template

Catalogs can also be given to bodosdk jobs. When the job is run, the SQL code will be executed on the tables in the connected remote database.

Using Database Catalogs in Interactive Notebooks¶

Supported On AWS · Supported on Azure

Important

Using Database Catalogs in Interactive Notebooks is only supported for Snowflake Database Catalogs.

When you create a code cell in your interactive notebook, you will notice a blue selector on the top right hand corner of the code cell. By default, this will be set to Parallel-Python. This means that any code written in this cell will execute on all cores of the attached cluster.

Code cell

To enable running native SQL code, you can set the cell type in the blue selector to SQL, and you will need to select your Catalog from the Catalog selector to the left of the cell type selector as shown in the figure below.

Native SQL cell

The output of the SQL query is automatically saved in a distributed dataframe named LAST_SQL_OUTPUT. This dataframe will be overwritten every time a SQL query is run.

Viewing Database Catalogs Data¶

To view the connection data stored in a catalog first connect to a cluster and then run the following in a code cell:

import bodo_platform_utils
bodo_platform_utils.catalog.get_data("catalog_name")