Dataset

Classification_problem

class gaggle.problem.dataset.classification_problem.ClassificationProblem(problem_args: ProblemArgs = None, sys_args: SysArgs = None)[source]

Bases: Problem

A Problem that represents a standard Machine Learning classification problem. It stores the associated training and validation dataset. Population evaluation optimized for GPU by default to speed up training. To create a classification problem with a custom dataset, register said dataset in the DatasetFactory.

evaluate(individual: Individual, train: bool = True, *args, **kwargs) float[source]

Evaluates an individual on the current batch of data.

Parameters:
  • individual

  • train – whether we are currently training or performing an inference.

  • *args

  • **kwargs

Returns:

evaluate_population(population_manager: PopulationManager, use_freshness: bool = True, update_manager: bool = True, train: bool = True, *args, **kwargs) dict[slice(<class 'int'>, <class 'float'>, None)][source]

Population evaluation optimized for GPU by default to speed up training. Should only be modified if specific custom behavior is desired. It is usually not recommend to modify this function.

Parameters:
  • population_manager

  • use_freshness

  • update_manager

  • train

  • *args

  • **kwargs

Returns:

The dictionary of individual fitnesses

Dataset

class gaggle.problem.dataset.dataset.DataWrapper(data: Tensor = None, targets: Tensor = None)[source]

Bases: Dataset

Wrapper that set the .data and .targets attributes that can then be accessed by the Dataset class in the get_data_and_targets method.

See also

This class creates attributes that are used by Dataset.get_data_and_targets.

class gaggle.problem.dataset.dataset.Dataset(problem_args: ProblemArgs = None, train: bool = True, sys_args: SysArgs = None)[source]

Bases: Dataset, ABC

Dataset class that allows for more flexible custom indexing and other behavior

copy()[source]

Return a copy of this dataset instance.

enable_normalization(enable: bool) None[source]

Method to enable or disable normalization.

get_data_and_targets()[source]

Gets the data and the targets for the current dataset stored in the self.data object. The self.data object should have .data and .targets attributes to be returned.

Returns:

A tuple containing (data, targets) or (None, None) if the dataset is not initialized.

get_data_and_transform()[source]
Returns:

Returns ((data, targets), transforms)

num_classes() int[source]

Return the number of classes

print_class_distribution()[source]
random_subset(n: int)[source]

Creates a random subset of this dataset

remove_classes(target_classes: List[int])[source]

Creates a subset without samples from one target class.

size()[source]

Alternative function to get the size.

subset(idx: List[int] | int)[source]

Creates a subset of this dataset.

visualize(sq_size: int = 3) None[source]

Plot samples from this dataset as a square.

without_normalization() Dataset[source]

Return a copy of this data without normalization.

Dataset_factory

class gaggle.problem.dataset.dataset_factory.DatasetFactory[source]

Bases: object

Factory that generates pre-existing available datasets. DatasetFactory.datasets stores said datasets as a dictionary with their name as key and the uninitialized Dataset object as value.

See also

Dataset Class

datasets = {'CIFAR10': <class 'gaggle.problem.dataset.base_datasets.cifar10.CIFAR10'>, 'MNIST': <class 'gaggle.problem.dataset.base_datasets.mnist.MNIST'>}
static from_data(data: Tensor, targets: Tensor, train: bool = True, seed: int = 1337) Dataset[source]

Creates a basic dataset object from given data and targets with basic arguments.

Parameters:
  • data – data tensor

  • targets – target/label tensor

  • train – whether it is a training or evaluation dataset

  • seed – seed for the randomness of the batch sampling

Returns:

A Dataset object.

classmethod from_problem_args(problem_args: ProblemArgs = None, train: bool = True, sys_args: SysArgs = None) Dataset[source]

Initializes the requested dataset from the dictionary of available datasets.

This is done by using the attribute problem_args.dataset_name as the lookup key to DatasetFactory.datasets.

Parameters:
  • problem_args – problem args that will be used to build the Dataset

  • train – whether we should return the training or evaluation dataset

  • sys_args – system args

Returns:

A Dataset object.

classmethod get_keys()[source]

Gets the keys (dataset names) for the available pre-built datasets.

Returns:

list of strings that are the keys to DatasetFactory.datasets

classmethod update(key, dataset)[source]

Add a new dataset to the dictionary of datasets that can be created.

It is added to DatasetFactory.datasets

Parameters:
  • key – dataset name that will be used as the dictionary lookup key

  • dataset – dataset class object, it needs to not be already initialized