`pd.merge`¶

pandas.merge(left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None, _bodo_na_equal=True)

Supported Arguments¶

argument	datatypes	other requirements
`left`	DataFrame
`right`	DataFrame
`how`	String	Must be one of `"inner"`, `"outer"`, `"left"`, `"right"` Must be constant at Compile Time
`on`	Column Name, List of Column Names, or General Merge Condition String (see merge-notes)	Must be constant at Compile Time
`left_on`	Column Name or List of Column Names	Must be constant at Compile Time
`right_on`	Column Name or List of Column Names	Must be constant at Compile Time
`left_index`	Boolean	Must be constant at Compile Time
`right_index`	Boolean	Must be constant at Compile Time
`suffixes`	Tuple of Strings	Must be constant at Compile Time
`indicator`	Boolean	Must be constant at Compile Time
`_bodo_na_equal`	Boolean	Must be constant at Compile Time This argument is unique to Bodo and not available in Pandas. If False, Bodo won't consider NA/nan keys as equal, which differs from Pandas.

Important

The argument _bodo_na_equal is unique to Bodo and not available in Pandas. If it is False, Bodo won't consider NA/nan keys as equal, which differs from Pandas.

Merge Notes¶

Output Ordering:

The output dataframe is not sorted by default for better parallel performance (Pandas may preserve key order depending on how). One can use explicit sort if needed.
General Merge Conditions:

Within Pandas, the merge criteria supported by pd.merge are limited to equality between 1 or more pairs of keys. For some use cases, this is not sufficient and more generalized support is necessary. For example, with these limitations, a left outer join where df1.A == df2.B & df2.C < df1.A cannot be efficiently computed.

Bodo supports these use cases by allowing users to pass general merge conditions to pd.merge. We plan to contribute this feature to Pandas to ensure full compatibility of Bodo and Pandas code.

General merge conditions are performed by providing the condition as a string via the on argument. Columns in the left table are referred to by left.{column name} and columns in the right table are referred to by right.{column name}.

Here's an example demonstrating the above:
```
>>> @bodo.jit
... def general_merge(df1, df2):
...   return df1.merge(df2, on="left.`A` == right.`B` & right.`C` < left.`A`", how="left")

>>> df1 = pd.DataFrame({"col": [2, 3, 5, 1, 2, 8], "A": [4, 6, 3, 9, 9, -1]})
>>> df2 = pd.DataFrame({"B": [1, 2, 9, 3, 2], "C": [1, 7, 2, 6, 5]})
>>> general_merge(df1, df2)

   col  A     B     C
0    2  4  <NA>  <NA>
1    3  6  <NA>  <NA>
2    5  3  <NA>  <NA>
3    1  9     9     2
4    2  9     9     2
5    8 -1  <NA>  <NA>
```
These calls have a few additional requirements:
- The condition must be constant string.
- The condition must be of the form cond_1 & ... & cond_N where at least one cond_i is a simple equality. This restriction will be removed in a future release.
- The columns specified in these conditions are limited to certain column types. We currently support boolean, integer, float, datetime64, timedelta64, datetime.date, and string columns.

Example Usage¶

>>> @bodo.jit
... def f(df1, df2):
...   return pd.merge(df1, df2, how="inner", on="key")

>>> df1 = pd.DataFrame({"key": [2, 3, 5, 1, 2, 8], "A": np.array([4, 6, 3, 9, 9, -1], float)})
>>> df2 = pd.DataFrame({"key": [1, 2, 9, 3, 2], "B": np.array([1, 7, 2, 6, 5], float)})
>>> f(df1, df2)

key    A    B
0    2  4.0  7.0
1    2  4.0  5.0
2    3  6.0  6.0
3    1  9.0  1.0
4    2  9.0  7.0
5    2  9.0  5.0

pd.merge¶

Supported Arguments¶

Merge Notes¶

Example Usage¶

`pd.merge`¶