pd.read_parquet¶
pandas.read_parquet
- example usage and more system specific instructions
- Arguments
path
andcolumns
are supported.columns
should be a constant list of strings if provided.path
can be a string or list. If string, must be a path to a file or a directory, or a glob string. If a list, must contain paths to parquet files (not directories) or glob strings. - Argument
anon
ofstorage_options
is supported for S3 filepaths. -
If
path
can be inferred as a constant (e.g. it is a function argument), Bodo finds the schema from file at compilation time. Otherwise, schema should be provided using the numba syntax.For example:
-
_bodo_input_file_name_col
is a Bodo specific argument. When specified, a column with this name is added to the dataframe consisting of the name of the file the row was read from. This is similar to SparkSQL'sinput_file_name
function.For example:
-
_bodo_read_as_dict
is a Bodo specific argument which forces the specified string columns to be read with dictionary-encoding. Bodo automatically loads string columns using dictionary encoding when it determines it would be beneficial based on a heuristic. Dictionary-encoding stores data in memory in an efficient manner and is most effective when the column has many repeated values. Read more about dictionary-encoded layout here.For example: