pd.read_csv¶
pandas.read_csv
- example usage and more system specific instructions
filepath_or_buffershould be a string and is required. It could be pointing to a single CSV file, or a directory containing multiple partitioned CSV files (must havecsvfile extension inside directory). - Arguments
sep,delimiter,header,names,index_col,usecols,dtype,nrows,skiprows,chunksize,parse_dates, andlow_memoryare supported. - Argument
anonofstorage_optionsis supported for S3 filepaths. - Either
namesanddtypearguments should be provided to enable type inference, orfilepath_or_buffershould be inferrable as a constant string. This is required so bodo can infer the types at compile time, see compile time constants names,usecols,parse_datesshould be constant lists.dtypeshould be a constant dictionary of strings and types.skiprowsmust be an integer or list of integers and if it is not a constant,namesmust be provided to enable type inference.chunksizeis supported for uncompressed files only.low_memoryinternally process file in chunks while parsing. In Bodo this is set toFalseby default.- When set to
True, Bodo parses file in chunks but like Pandas the entire file is read into a single DataFrame regardless. - If you want to load data in chunks, use the
chunksizeargument. - When a CSV file is read in parallel (distributed mode) and each process reads only a portion of the file, reading columns that contain line breaks is not supported.
-
_bodo_read_as_dictis a Bodo specific argument which forces the specified string columns to be read with dictionary-encoding. Dictionary-encoding stores data in memory in an efficient manner and is most effective when the column has many repeated values. Read more about dictionary-encoded layout here.For example: