copt.datasets.load_rcv1

copt.datasets.load_rcv1(subset='full', data_dir='/builder/home/copt_data')

Download and return the RCV1 dataset.

Properties:

n_samples: 697641 n_features: 47236 density: 0.1% of nonzero coefficienets in train set

This is the binary classification version of the dataset as found in the LIBSVM dataset project:

Parameters
  • subset – string

  • be one of 'full' for full dataset (Can) –

  • for only the train set ('train') –

  • 'test' for only the test set. (or) –

  • data_dir – string

  • from which to read the data. Defaults to $HOME/copt_data/ (Directory) –

Returns

scipy.sparse CSR matrix

y: numpy array Labels, only takes values 0 or 1.

Return type

X