Native SQL with Database Catalogs¶
Database Catalogs are configuration objects that grant BodoSQL access to load tables from a remote database. Bodo platform now supports adding Database catalogs through the UI and provides users the option to write native SQL code to run on the tables in the connected remote database.
Adding a Database Catalog¶
Supported On AWS · Supported on Azure
In your workspaces view, navigate to the Catalogs section in the sidebar. Click on CREATE CATALOG and fill up the form with the required values.
Currently, we support Snowflake Database Catalogs, Tabular Database Catalogs and AWS Glue Catalogs on the Bodo Platform.
See SnowflakeCatalog
, TabularCatalog
and GlueCatalog
for details on
the required parameters.
Upon submitting the form, you will see that your Catalog has been created and is now available to use in your interactive notebook.
Running a Job With a Database Catalog¶
Supported On AWS · Supported on Azure
To run a SQL job with the database catalog you need to create a job template in the jobs tab.
Configure the job template normally and under Advanced Options, you can select the Catalog you want to use.
Catalogs can also be given to bodosdk jobs. When the job is run, the SQL code will be executed on the tables in the connected remote database.
Using Database Catalogs in Interactive Notebooks¶
Supported On AWS · Supported on Azure
Important
Using Database Catalogs in Interactive Notebooks is only supported for Snowflake Database Catalogs.
When you create a code cell in your interactive notebook, you will notice a blue selector on the top right hand corner of the code cell. By default, this will be set to Parallel-Python. This means that any code written in this cell will execute on all cores of the attached cluster.
To enable running native SQL code, you can set the cell type in the blue selector to SQL, and you will need to select your Catalog from the Catalog selector to the left of the cell type selector as shown in the figure below.
The output of the SQL query is automatically saved in a distributed dataframe named LAST_SQL_OUTPUT. This dataframe will be overwritten every time a SQL query is run.
Viewing Database Catalogs Data¶
To view the connection data stored in a catalog first connect to a cluster and then run the following in a code cell:
See Also