pd.read_csv¶
pandas.read_csv
- example usage and more system specific instructions
filepath_or_buffer
should be a string and is required. It could be pointing to a single CSV file, or a directory containing multiple partitioned CSV files (must havecsv
file extension inside directory). - Arguments
sep
,delimiter
,header
,names
,index_col
,usecols
,dtype
,nrows
,skiprows
,chunksize
,parse_dates
, andlow_memory
are supported. - Argument
anon
ofstorage_options
is supported for S3 filepaths. - Either
names
anddtype
arguments should be provided to enable type inference, orfilepath_or_buffer
should be inferrable as a constant string. This is required so bodo can infer the types at compile time, see compile time constants names
,usecols
,parse_dates
should be constant lists.dtype
should be a constant dictionary of strings and types.skiprows
must be an integer or list of integers and if it is not a constant,names
must be provided to enable type inference.chunksize
is supported for uncompressed files only.low_memory
internally process file in chunks while parsing. In Bodo this is set toFalse
by default.- When set to
True
, Bodo parses file in chunks but like Pandas the entire file is read into a single DataFrame regardless. - If you want to load data in chunks, use the
chunksize
argument. - When a CSV file is read in parallel (distributed mode) and each process reads only a portion of the file, reading columns that contain line breaks is not supported.
-
_bodo_read_as_dict
is a Bodo specific argument which forces the specified string columns to be read with dictionary-encoding. Dictionary-encoding stores data in memory in an efficient manner and is most effective when the column has many repeated values. Read more about dictionary-encoded layout here.For example: