Supported Numpy Operations¶

Below is the list of the data-parallel Numpy operators that Bodo can optimize and parallelize.

1. Numpy element-wise array operations:

• Unary operators: `+` `-` `~`

• Binary operators: `+` `-` `*` `/` `/?` `%` `|` `>>` `^` `<<` `&` `**` `//`

• Comparison operators: `==` `!=` `<` `<=` `>` `>=`

• data-parallel math operations:

• `numpy.add()`

• `numpy.subtract()`

• `numpy.multiply()`

• `numpy.divide()`

• `numpy.logaddexp()`

• `numpy.logaddexp2()`

• `numpy.true_divide()`

• `numpy.floor_divide()`

• `numpy.negative()`

• `numpy.positive()`

• `numpy.power()`

• `numpy.remainder()`

• `numpy.mod()`

• `numpy.fmod()`

• `numpy.abs()`

• `numpy.absolute()`

• `numpy.fabs()`

• `numpy.rint()`

• `numpy.sign()`

• `numpy.conj()`

• `numpy.exp()`

• `numpy.exp2()`

• `numpy.log()`

• `numpy.log2()`

• `numpy.log10()`

• `numpy.expm1()`

• `numpy.log1p()`

• `numpy.sqrt()`

• `numpy.square()`

• `numpy.reciprocal()`

• `numpy.gcd()`

• `numpy.lcm()`

• `numpy.conjugate()`

• Trigonometric functions:

• `numpy.sin()`

• `numpy.cos()`

• `numpy.tan()`

• `numpy.arcsin()`

• `numpy.arccos()`

• `numpy.arctan()`

• `numpy.arctan2()`

• `numpy.hypot()`

• `numpy.sinh()`

• `numpy.cosh()`

• `numpy.tanh()`

• `numpy.arcsinh()`

• `numpy.arccosh()`

• `numpy.arctanh()`

• `numpy.deg2rad()`

• `numpy.rad2deg()`

• `numpy.degrees()`

• `numpy.radians()`

• Bit manipulation functions:

• `numpy.bitwise_and()`

• `numpy.bitwise_or()`

• `numpy.bitwise_xor()`

• `numpy.bitwise_not()`

• `numpy.invert()`

• `numpy.left_shift()`

• `numpy.right_shift()`

• Comparison functions:

• `numpy.logical_and()`

• `numpy.logical_or()`

• `numpy.logical_xor()`

• `numpy.logical_not()`

• Floating functions:

• `numpy.isfinite()`

• `numpy.isinf()`

• `numpy.signbit()`

• `numpy.ldexp()`

• `numpy.floor()`

• `numpy.ceil()`

• `numpy.trunc()`

2. Numpy reduction functions:

3. Numpy array creation functions:

4. Numpy array manipulation functions:

5. Numpy mathematical and statistics functions:

6. Random number generator functions:

7. `numpy.dot()` function between a matrix and a vector, or two vectors.

8. Numpy array comprehensions, such as:

```A = np.array([i**2 for i in range(N)])
```
9. Numpy I/O: `numpy.ndarray.tofile()` and `numpy.fromfile()`. The File I/O section contains example usage and more system specific instructions.

Optional arguments are not supported unless if explicitly mentioned here. For operations on multi-dimensional arrays, automatic broadcast of dimensions of size 1 is not supported.

Numpy dot() Parallelization¶

The np.dot function has different distribution rules based on the number of dimensions and the distributions of its input arrays. The example below demonstrates two cases:

```@bodo.jit
def example_dot(N, D):
X = np.random.ranf((N, D))
Y = np.random.ranf(N)
w = np.dot(Y, X)
z = np.dot(X, w)
return z.sum()

example_dot(1024, 10)
example_dot.distributed_diagnostics()
```

Here is the output of distributed_diagnostics():

```Data distributions:
\$X.130               1D_Block
\$Y.131               1D_Block
\$b.2.158             REP

Parfor distributions:
0                    1D_Block
1                    1D_Block
3                    1D_Block

Distributed listing for function example_dot, ../tmp/dist_rep.py (4)
----------------------------------| parfor_id/variable: distribution
@bodo.jit                         |
def example_dot(N, D):            |
X = np.random.ranf((N, D))----| #0: 1D_Block, \$X.130: 1D_Block
Y = np.random.ranf(N)---------| #1: 1D_Block, \$Y.131: 1D_Block
w = np.dot(Y, X)--------------| \$b.2.158: REP
z = np.dot(X, w)--------------| #3: 1D_Block
return z.sum()                |
```

The first dot has a 1D array with 1D_Block distribution as first input (Y), while the second input is a 2D array with 1D_Block distribution (X). Hence, dot is a sum reduction across distributed datasets and therefore, the output (w) is on the reduce side and is assigned REP distribution.

The second dot has a 2D array with 1D_Block distribution (X) as first input, while the second input is a REP array (w). Hence, the computation is data-parallel across rows of X, which implies a 1D_Block distribution for output (z).

Variable z does not exist in the distribution report since the compiler optimizations were able to eliminate it. Its values are generated and consumed on-the-fly, without memory load/store overheads.