MixMatch

This is my implementation of the experiment in the paper of mixmatch. On my platform, the accuracy reaches 89+ on cifar10 with 250 labeled images.

Environment setup

download cifar-10 dataset:

    $ mkdir -p dataset && cd dataset
    $ wget -c http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
    $ tar -xzvf cifar-10-python.tar.gz

    $ sh run.sh

use exponential moving average (EMA) to update model parameters.
though softmax has negative impact on the training with mse loss, the paper still use mse loss (from softmax predictions of unlabeled data to guessed label) to train the model.
it is better to warmup the balance factor between the labeled loss and the unlabeled loss. The official repository let the factor improve from 0 to 75 during the whole 1024 epoches. Maybe it is better to slowly increase the contribution of the unlabeled data.
do not use dropout in the wide-resnet-28-2.
wd should be added to model(not ema) weight directly rather than added via optimizer options, which is actually added to the gradients.
~~use ema parameters to guess the labels.~~ That is what mean teacher does.
mixup should use different mix coefficients for each samples in the batch, rather than one coefficient for the whole batch.