Note
Click here to download the full example code
Benchmark of Frank-Wolfe variants for sparse logistic regressionΒΆ
Comparison of different Frank-Wolfe variants on various
problems with a logistic regression loss (copt.utils.LogLoss()
)
and a L1 ball constraint (copt.utils.L1Ball()
).
Out:
Running on the Gisette dataset
gisette dataset is not present in the folder /builder/home/copt_data/gisette. Downloading it ...
Finished downloading
Extracting data to /builder/home/copt_data/gisette/X_test.indptr.npy
Extracting data to /builder/home/copt_data/gisette/y_train.npy
Extracting data to /builder/home/copt_data/gisette/X_test.data.npy
Extracting data to /builder/home/copt_data/gisette/X_train.indptr.npy
Extracting data to /builder/home/copt_data/gisette/y_test.npy
Extracting data to /builder/home/copt_data/gisette/X_train.data.npy
Extracting data to /builder/home/copt_data/gisette/X_train.indices.npy
Extracting data to /builder/home/copt_data/gisette/X_test.indices.npy
Sparsity of solution: 0.0002
Running on the RCV1 dataset
rcv1 dataset is not present in the folder /builder/home/copt_data/rcv1. Downloading it ...
Finished downloading
Extracting data to /builder/home/copt_data/rcv1/X_test.indptr.npy
Extracting data to /builder/home/copt_data/rcv1/y_train.npy
Extracting data to /builder/home/copt_data/rcv1/X_test.data.npy
Extracting data to /builder/home/copt_data/rcv1/X_train.indptr.npy
Extracting data to /builder/home/copt_data/rcv1/y_test.npy
Extracting data to /builder/home/copt_data/rcv1/X_train.data.npy
Extracting data to /builder/home/copt_data/rcv1/X_train.indices.npy
Extracting data to /builder/home/copt_data/rcv1/X_test.indices.npy
Sparsity of solution: 0.0006774494029977136
Running on the Madelon dataset
madelon dataset is not present in the folder /builder/home/copt_data/madelon. Downloading it ...
Finished downloading
Extracting data to /builder/home/copt_data/madelon/X_train.indptr.npy
Extracting data to /builder/home/copt_data/madelon/X_test.indptr.npy
Extracting data to /builder/home/copt_data/madelon/y_test.npy
Extracting data to /builder/home/copt_data/madelon/y_train.npy
Extracting data to /builder/home/copt_data/madelon/X_test.data.npy
Extracting data to /builder/home/copt_data/madelon/X_train.indices.npy
Extracting data to /builder/home/copt_data/madelon/X_train.data.npy
Extracting data to /builder/home/copt_data/madelon/X_test.indices.npy
Sparsity of solution: 0.004
Running on the Covtype dataset
covtype dataset is not present in the folder /builder/home/copt_data/covtype. Downloading it ...
Finished downloading
Extracting data to /builder/home/copt_data/covtype/y_train.npy
Extracting data to /builder/home/copt_data/covtype/X_train.indptr.npy
Extracting data to /builder/home/copt_data/covtype/X_train.data.npy
Extracting data to /builder/home/copt_data/covtype/X_train.indices.npy
Sparsity of solution: 0.25925925925925924
import matplotlib.pyplot as plt
import numpy as np
import copt as cp
# .. datasets and their loading functions ..
import copt.constraint
import copt.loss
datasets = [
("Gisette", cp.datasets.load_gisette, 6e3),
("RCV1", cp.datasets.load_rcv1, 2e4),
("Madelon", cp.datasets.load_madelon, 20.0),
("Covtype", cp.datasets.load_covtype, 200.0),
]
variants_fw = [
["backtracking", "adaptive step-size", "s"],
["DR", "Lipschitz step-size", "<"],
]
for dataset_title, load_data, alpha in datasets:
plt.figure()
print("Running on the %s dataset" % dataset_title)
X, y = load_data()
n_samples, n_features = X.shape
l1_ball = copt.constraint.L1Ball(alpha)
f = copt.loss.LogLoss(X, y)
x0 = np.zeros(n_features)
for step, label, marker in variants_fw:
cb = cp.utils.Trace(f)
sol = cp.minimize_frank_wolfe(
f.f_grad, x0, l1_ball.lmo, callback=cb, step=step, lipschitz=f.lipschitz
)
plt.plot(cb.trace_time, cb.trace_fx, label=label, marker=marker, markevery=10)
print("Sparsity of solution: %s" % np.mean(np.abs(sol.x) > 1e-8))
plt.legend()
plt.xlabel("Time (in seconds)")
plt.ylabel("Objective function")
plt.title(dataset_title)
plt.tight_layout() # otherwise the right y-label is slightly clipped
plt.xlim((0, 0.7 * cb.trace_time[-1])) # for aesthetics
plt.grid()
plt.show()
Total running time of the script: ( 12 minutes 14.990 seconds)
Estimated memory usage: 1950 MB