Numpy Operations¶
Below is the list of the data-parallel Numpy operators that Bodo can optimize and parallelize.
Numpy element-wise array operations¶
Unary operators¶
+-~
Binary operators¶
+-*//?%|>>^<<&**//
Comparison operators¶
==!=<<=>>=
Data-parallel math operations¶
numpy.addnumpy.subtractnumpy.multiplynumpy.dividenumpy.logaddexpnumpy.logaddexp2numpy.true_dividenumpy.floor_dividenumpy.negativenumpy.positivenumpy.powernumpy.remaindernumpy.modnumpy.fmodnumpy.absnumpy.absolutenumpy.fabsnumpy.rintnumpy.signnumpy.conjnumpy.expnumpy.exp2numpy.lognumpy.log2numpy.log10numpy.expm1numpy.log1pnumpy.sqrtnumpy.squarenumpy.reciprocalnumpy.gcdnumpy.lcmnumpy.conjugate
Trigonometric functions¶
numpy.sinnumpy.cosnumpy.tannumpy.arcsinnumpy.arccosnumpy.arctannumpy.arctan2numpy.hypotnumpy.sinhnumpy.coshnumpy.tanhnumpy.arcsinhnumpy.arccoshnumpy.arctanhnumpy.deg2radnumpy.rad2degnumpy.degreesnumpy.radians
Bit manipulation functions¶
numpy.bitwise_andnumpy.bitwise_ornumpy.bitwise_xornumpy.bitwise_notnumpy.invertnumpy.left_shiftnumpy.right_shift
Comparison functions¶
numpy.logical_andnumpy.logical_ornumpy.logical_xornumpy.logical_not
Floating functions¶
numpy.isfinitenumpy.isinfnumpy.signbitnumpy.ldexpnumpy.floornumpy.ceilnumpy.trunc
Numpy reduction functions¶
numpy.sumnumpy.prodnumpy.minnumpy.maxnumpy.argminnumpy.argmaxnumpy.allnumpy.any
Numpy array creation functions¶
numpy.emptynumpy.identitynumpy.zerosnumpy.onesnumpy.empty_likenumpy.zeros_likenumpy.ones_likenumpy.full_likenumpy.arraynumpy.asarraynumpy.copynumpy.arangenumpy.linspacenumpy.repeatonly scalarnum_repeats
Numpy array manipulation functions¶
numpy.shape-
numpy.reshapeshapevalues cannot be -1. -
numpy.sort numpy.concatenatenumpy.append-
numpy.uniqueThe output is assumed to be "small" relative to input and is replicated. UseSeries.drop_duplicates()if the output should remain distributed. -
numpy.where(1 and 3 arguments) numpy.selectThe default value for numeric/boolean types is0/False. For all other types, the default ispd.NA. If any of the values inchoicelistare nullable, or the default ispd.NAorNone, the output will be a nullable pandas array instead of a numpy array.numpy.nan_to_numconverts infinity/NaN values to regular floats.numpy.union1dnumpy.intersect1dno distributed support yetnumpy.setdiff1dno distributed support yetnumpy.hstackconcatenates elements on each rank without maintaining ordernumpy.tileSupported in 2 cases: the array is 2D andrepsis in the form(1, x), or the array is 1D andrepsis in the form(x, 1).numpy.ndarray.Tdistributed array transpose is supported for 2D arrays.
Numpy mathematical and statistics functions¶
numpy.cumsumnumpy.diffnumpy.percentilenumpy.quantilenumpy.mediannumpy.meannumpy.stdnumpy.interpno distributed support yet.np.linalg.normparallelized only for 2D inputs with axis=1.
Random number generator functions¶
numpy.random.randnumpy.random.randnnumpy.random.ranfnumpy.random.random_samplenumpy.random.samplenumpy.random.randomnumpy.random.standard_normalnumpy.random.multivariate_normal(must provide size)numpy.random.chisquarenumpy.random.weibullnumpy.random.powernumpy.random.geometricnumpy.random.exponentialnumpy.random.poissonnumpy.random.rayleighnumpy.random.normalnumpy.random.uniformnumpy.random.betanumpy.random.binomialnumpy.random.fnumpy.random.gammanumpy.random.lognormalnumpy.random.laplacenumpy.random.randintnumpy.random.triangular
numpy.dot function¶
numpy.dotbetween a matrix and a vector or between two vectors.
Numpy I/O¶
numpy.ndarray.tofilenumpy.fromfilesupports reading binary files.file,dtype,count, andoffsetarguments are supported (fileanddtypeare required).fileshould be a string.s3://andhdfs://file paths are also supported.
Our documentation on scalable I/O contains example usage and more system specific instructions.
Numpy matrix support¶
numpy.asmatrixparallelized only for array or matrix input.*left-hand side argument can be distributed but right-hand side argument is replicated.
Miscellaneous¶
- Numpy array comprehension : e.g. : A = np.array([i**2 for i in range(N)])
Note
Optional arguments are not supported unless if explicitly mentioned here. For operations on multi-dimensional arrays, automatic broadcast of dimensions of size 1 is not supported.
Numpy dot() Parallelization¶
The np.dot function has different distribution rules based
on the number of dimensions and the distributions of its input arrays.
The example below demonstrates two cases:
@bodo.jit
def example_dot(N, D):
X = np.random.ranf((N, D))
Y = np.random.ranf(N)
w = np.dot(Y, X)
z = np.dot(X, w)
return z.sum()
example_dot(1024, 10)
example_dot.distributed_diagnostics()
Here is the output of distributed_diagnostics():
Data distributions:
$X.130 1D_Block
$Y.131 1D_Block
$b.2.158 REP
Parfor distributions:
0 1D_Block
1 1D_Block
3 1D_Block
Distributed listing for function example_dot, ../tmp/dist_rep.py (4)
++++++++++++++++++++++++++++++++++| parfor_id/variable: distribution
@bodo.jit |
def example_dot(N, D): |
X = np.random.ranf((N, D))++++| #0: 1D_Block, $X.130: 1D_Block
Y = np.random.ranf(N)+++++++++| #1: 1D_Block, $Y.131: 1D_Block
w = np.dot(Y, X)++++++++++++++| $b.2.158: REP
z = np.dot(X, w)++++++++++++++| #3: 1D_Block
return z.sum() |
The first dot has a 1D array with 1D_Block
distribution as first input Y, while the second input X is
a 2D array with 1D_Block distribution.
Hence, dot is a sum reduction across distributed datasets
and therefore, the output (w) is on the reduce side and is
assigned REP distribution.
The second dot has a 2D array with 1D_Block
distribution (X) as first input, while the second input is
a REP array (w). Hence, the computation is data-parallel
across rows of X, which implies a 1D_Block
distribution for output (z).
Variable z does not exist in the distribution report since
the compiler optimizations were able to eliminate it. Its values are
generated and consumed on-the-fly, without memory load/store overheads.