9. Supported Machine Learning¶
9.1. Supported Scikit-learn¶
Below is the list of scikit-learn classes and functions that Bodo supports natively inside JIT functions. This list will expand regularly as we add support for more APIs. Optional arguments are not supported unless specified.
9.1.1. Linear Classifiers¶
sklearn.linear_model.LogisticRegression
This class provides logistic regression classifier.
Methods:
sklearn.linear_model.SGDClassifier
This class provides linear classification models with SGD optimization which allows distributed large-scale learning.
SGDClassifier(loss='hinge')
is equivalent to SVM linear classifer.
SGDClassifier(loss='log')
is equivalent to logistic regression classifer.
Supported loss functions
hinge
andlog
.
early_stopping
is not supported yet.
Methods:
This class provides Linear Support Vector Classification.
Methods:
9.1.2. Linear Regressors¶
sklearn.linear_model.LinearRegression
This class provides linear regression support. Note: Multilabel targets are not currently supported.
Methods:
This class provides ridge regression support.
Methods:
sklearn.linear_model.SGDRegressor
This class provides linear regression models with SGD optimization which allows distributed large-scale learning.
SGDRegressor(loss='squared_loss', penalty='None')
is equivalent to linear regression.
SGDRegressor(loss='squared_loss', penalty='l2')
is equivalent to Ridge regression.
SGDRegressor(loss='squared_loss', penalty='l1')
is equivalent to Lasso regression.
Supported loss function is
squared_loss
early_stopping
is not supported yet.
Methods:
This class provides Lasso regression support.
Methods:
9.1.3. Clustering¶
This class provides K-Means clustering models which allows distributed large-scale unsupervised learning.
Methods:
9.1.4. Ensemble Methods¶
sklearn.ensemble.RandomForestClassifier
This class provides Random Forest Classifier, an ensemble learning model, for distributed large-scale learning.
Methods:
9.1.5. Naive Bayes¶
sklearn.naive_bayes.MultinomialNB
This class provides Naive Bayes classifier for multinomial models with distributed large-scale learning.
Methods:
9.1.6. Classification metrics¶
9.1.7. Regression metrics¶
9.1.8. Data Preprocessing¶
sklearn.preprocessing.StandardScaler
This class provides Standard Scaler support to center your data and to scale it to achieve unit variance.
Methods:
sklearn.preprocessing.MinMaxScaler
This class provides MinMax Scaler support to scale your data based on the range of its features.
Methods:
sklearn.preprocessing.LabelEncoder
This class provides LabelEncoder support to encode target labels (y) with values between 0 and n-classes-1.
Methods:
9.1.9. Model Selection¶
sklearn.model_selection.train_test_split()
Currently it only supports two inputs of type numpy arrays and/or pandas dataframes.
Arguments
train_size
andtest_size
accept float between 0.0 and 1.0 orNone
only.Arguments
random_state
andshuffle
are supported.Argument
stratify
is not supported yet.
9.2. Supported XGBoost¶
Below is the list of XGBoost (using the Scikit-Learn-like API) classes and functions that Bodo supports natively inside JIT functions. This list will expand regularly as we add support for more APIs.
9.2.1. XGBClassifier¶
This class provides implementation of the scikit-learn API for XGBoost classification with distributed large-scale learning.
Methods:
9.2.2. XGBRegressor¶
This class provides implementation of the scikit-learn API for XGBoost regression with distributed large-scale learning.
Methods: