pd.read_csv¶

pandas.read_csv

example usage and more system specific instructions filepath_or_buffer should be a string and is required. It could be pointing to a single CSV file, or a directory containing multiple partitioned CSV files (must have csv file extension inside directory).
Arguments sep, delimiter, header, names, index_col, usecols, dtype, nrows, skiprows, chunksize, parse_dates, and low_memory are supported.
Argument anon of storage_options is supported for S3 filepaths.
Either names and dtype arguments should be provided to enable type inference, or filepath_or_buffer should be inferrable as a constant string. This is required so bodo can infer the types at compile time, see compile time constants
names, usecols, parse_dates should be constant lists.
dtype should be a constant dictionary of strings and types.
skiprows must be an integer or list of integers and if it is not a constant, names must be provided to enable type inference.
chunksize is supported for uncompressed files only.
low_memory internally process file in chunks while parsing. In Bodo this is set to False by default.
When set to True, Bodo parses file in chunks but like Pandas the entire file is read into a single DataFrame regardless.
If you want to load data in chunks, use the chunksize argument.
When a CSV file is read in parallel (distributed mode) and each process reads only a portion of the file, reading columns that contain line breaks is not supported.
_bodo_read_as_dict is a Bodo specific argument which forces the specified string columns to be read with dictionary-encoding. Dictionary-encoding stores data in memory in an efficient manner and is most effective when the column has many repeated values. Read more about dictionary-encoded layout here.

For example:
```
@bodo.jit
def impl(f):
  df = pd.read_csv(f, _bodo_read_as_dict=["A", "B", "C"])
  return df
```