📖 Arxiv: 2201.04182
💡 The main idea is to use the transformer model that given a few-shot task episode generates an entire inference model by producing all model weights in a single pass.
Small CNN Architectures: this method is effective than training a universal task-independent embedding.
Large CNN Architectures:
We develop a novel replay buffer consistent with the architecture and training protocol of ODT