澳门太阳集团20056-欢迎您

D1fe4de4c1ed570bb3d340de94a3f5d490a1fe50

近日,谷歌新模型EfficientNET进化之后,又刷新了ImageNet记录。这个叫Noisy Student的新模型,将ImageNet图像分类的top-1准确率提高到了87.4%,比此前最好的FixResNetXt-101 32×48d高出了1%。更令人惊叹的是,在ResNeXt-101 32×48d只达到了16.6%top-1准确率的ImageNet-A测试集上,Noisy Student一举将准确率提高到了74.2%。(来源: 新浪专栏·创事记)

新模型来自谷歌大脑首席科学家,也是谷歌大脑的创立成员,Quoc Le的团队,而论文的第一作者谢其哲,是致远五期计算机方向(ACM班)毕业生。他目前在卡内基梅隆大学攻读博士,导师为ACL首届fellow,Eduard Hovy,主要研究深度学习、计算机视觉、自然语言处理。谢其哲也是谷歌大脑的学生研究员,在谷歌师从谷歌大脑创立成员、Seq2Seq和AutoML的发明人Quoc Le。

谢其哲大二暑期进入交大智能语言实验室进行对话系统的科研锻炼,受到俞凯老师的悉心指导,树立了严谨的科研态度。大四上学期,其哲进入微软亚洲研究院机器学习组实习,在MSRA副院长刘铁岩博士的指导下进行自然语言处理方面的研究,刘博士坚持要做有影响的科研,至今仍然是其哲坚持科研决心的鼓励。

论文《Self-training with Noisy Student improves ImageNet classification》的简介如下:
We present a simple self-training method that achieves 87.4% top-1 accuracy on ImageNet, which is 1.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 16.6% to 74.2%, reduces ImageNet-C mean corruption error from 45.7 to 31.2, and reduces ImageNet-P mean flip rate from 27.8 to 16.1. 
To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as good as possible. But during the learning of the student, we inject noise such as data augmentation, dropout, stochastic depth to the student so that the noised student is forced to learn harder from the pseudo labels.
论文地址:https://arxiv.org/abs/1911.04252