Titre : Learning visual representations from weakly annotated data

Résumé :  Unprecedented amount of  visual data  is now available  on the
Internet. Wouldn't  it be great  if a machine could  automatically learn
from all this information? For example, imagine an autonomous robot that
can learn  how to change  a flat tire of  a car by  watching instruction
videos  on Youtube,  or that  can learn  how to  navigate in  a city  by
observing street-view  imagery. Learning from  this data is,  however, a
very challenging  problem as it  is equipped only with  weak supervisory
signals  such as  human  narration  of the  instruction  video or  noisy
geotags  for street-level  imagery. In  this talk,  I will  describe our
recent  progress on  learning  visual representations  from such  weakly
annotated visual data.dented  amount of visual data is  now available on
the  Internet. Wouldnżt  it be  great if  a machine  could automatically
learn  from all  this information?  For example,  imagine an  autonomous
robot that  can learn how  to change  a flat tire  of a car  by watching
instruction videos  on Youtube, or that  can learn how to  navigate in a
city  by observing  street-view  imagery. Learning  from  this data  is,
however, a  very challenging problem  as it  is equipped only  with weak
supervisory signals such as human  narration of the instruction video or
noisy geotags  for street-level imagery.  In this talk, I  will describe
our recent progress on learning  visual representations from such weakly
annotated visual data.