Program schedule

Summary: Full day workshop.

Half a day of keynotes, round table and half a day of papers delivered as oral presentations. Keynotes will happen in the morning. Panel after lunch. Followed by paper presentations in the afternoon.


Keynotes Chair: Daniela Braga

Keynote Title Abstract Affiliation Bio
Stefano Vegnaduzzo Crowdsourcing for highly unbalanced classes: challenges and opportunities. Today crowdsourcing is used for the most diverse tasks. Many tasks, often language-related, require human cognition to carry out association and/or verification: assigning a label to a text object, translation, speech transcription, photo tagging, etc. Associative tasks are typically performed on raw data that is available in large volumes without the association. When the data to be processed by the crowds represents the minority class(es) of a highly skewed distribution, new issues emerge: submitting to the crowds large unlabeled volumes of data is not efficient, neither operationally nor financially. This talk will discuss problems and solutions in using crowdsourcing platforms to label data for highly unbalanced classes. Senior Director, Data Science at Integral Ad Science Stefano Vegnaduzzo is a Vice President of Data Science at Integral Ad Science. His expertise includes natural language processing, search engines, conversational agents, and advertising technology. He has built and lead data science teams that have used data collected through in-house analysts, crowdsourcing platforms or combinations of the two.
Lyle Ungar Measuring Psychological Traits using Social Media The words and images people post on social media such as Twitter and Facebook provide a rich, if imperfect, view of who they are and what they care about. We show how crowdsourced data can be used to build models to predict people's age, gender, personality, and mental health from their social media. Such methods are increasingly used for applications ranging from targeted marketing to job candidate screening. Professor, Univ. Pennsylvania Dr. Lyle Ungar is a Professor of Computer and Information Science at the University of Pennsylvania, where he also holds appointments in multiple departments in the Schools of Business, Medicine, Arts and Sciences, and Engineering and Applied Science. Lyle received a B.S. from Stanford University and a Ph.D. from M.I.T. He has published over 200 articles, supervised two dozen PhD students, and is co-inventor on eleven patents. His current research focuses on developing scalable machine learning methods for data mining and text mining, including spectral methods for NLP, and analysis of social media to better understand the drivers of physical and mental well-being.
Daniel S. Weld High Quality Crowdsourcing Requestors often complain about the low-quality of crowd work, but whose fault is this? We argue that techniques like majority vote and expectation maximization (EM) miss the point and don’t solve the true, underlying problems: confusing task instructions and poor worker training. Instead we advocate three new methods: 1) gated instruction, micro-argumentation, and self-improving workflows. Professor at University of Washington Daniel S. Weld is Thomas J. Cable / WRF Professor of Computer Science & Engineering and Entrepreneurial Faculty Fellow at the University of Washington. After formative education at Phillips Academy, he received bachelor's degrees in both Computer Science and Biochemistry at Yale University in 1982. He landed a Ph.D. from the MIT Artificial Intelligence Lab in 1988, received a Presidential Young Investigator's award in 1989, an Office of Naval Research Young Investigator's award in 1990, was named AAAI Fellow in 1999 and deemed ACM Fellow in 2005. Dan was a founding editor for the Journal of AI Research, was area editor for the Journal of the ACM, guest editor for Computational Intelligence and Artificial Intelligence.
Daniela Braga Challenges and opportunities when collecting data for bots In the AI revolution era we’re living today, speaking bots, chat bots and robots are considered to be the next technology milestone to create a delightful and natural human-computer interaction. However, if Alexa, Siri, Cortana, Google Now and other personal assistants make it believe that it’s almost possible to have a human-machine conversation, at least in English, the same is not true for other languages and other domains. And the main reason behind this gap is the linguistic challenge and its impact on data. Language data, unlike image, sensor or bio signal data, has too many variables, making data collection and data labeling a challenge. This talk with walk through what it takes to create an end to end personal assistant from a data perspective, what are the main challenges faced by speech and NLP scientists and how those challenges can be overcome, always having in mind quality, speed and scale. Co-founder & CEO of DefinedCrowd corporation Founder and CEO of DefinedCrowd, one of the fastest growing startups in the AI space. With seventeen years working in Speech Technology both in academia and industry in Portugal, Spain, China and the US, Daniela Braga has deep expertise in Speech Science and is one the world leaders of Crowdsourcing adoption in large enterprises. Previously at Microsoft worked in pretty much all stacks of Speech Technology and shipped 26 languages for Exchange 14, 10 TTS voices in Windows 8 and was involved in Cortana. At Voicebox, created the Data Science and Crowdsourcing team, where she introduced Crowdsourcing for big data solutions and re-structured the Engineering infrastructure around data collection, processing, ingestion, instrumentation, storage, browsing and discoverability. Her effort has resulted in reducing data collection and processing costs by 80%; her approach has been adopted in multiple organizations. Dr. Braga is oftentimes guest lecturer in the University of Washington, USA, is the author of more than 90 scientific papers and several patents.
Ece Kamar Humans to the Rescue: Troubleshooting AI Systems with Human-in-the-loop As we increasingly rely on AI systems for high-stakes decisions in many domains ranging from judiciary to autonomous driving, understanding when, how and why these systems fail in the real-world is critical. Despite growing set of tools and algorithms developed for training machine learning models, relatively little attention has been paid to troubleshooting such systems in the real-world. In this talk, I’ll present two general frameworks that utilize human input in diagnosing and characterizing errors of machine learning systems towards gradual improvements. First, I’ll discuss how human input can be efficiently guided the discovery of blind-spots of machine learning models caused by biases in training. Next, I describe a human-in-the-loop framework for troubleshooting component-based systems. Researcher at Microsoft Research Ece Kamar is a researcher at the Adaptive Systems and Interaction group at Microsoft Research Redmond. Ece earned her Ph.D. in computer science from Harvard University. While at Harvard, she received the Microsoft Research fellowship and Robert L. Wallace Prize Fellowship for her work on Artificial Intelligence. She has served as a committee member for AAAI, IJCAI, AAMAS, HCOMP, UAI and was a member of the first AI 100 study panel. She works on a number of subfields of AI; including planning, machine learning and mechanism design. She is passionate about combining machine and human intelligence towards developing real-world applications.

Panel Discussion:

Moderator: Daniela Braga, CEO of DefinedCrowd



Papers chair: Michael Levit


Posters chair: Ece Kamar

Final Schedule

9:00 Keynotes session opening, Daniela Braga
9:10 Stefanno Vegnaduzzo, Senior Director, Data Science at Integral Ad Science
9h40 Lyle Ungar, Professor, Univ. Pennsylvania
10h10 Daniel S. Weld, Professor at University of Washington
10h40 Coffee break
11:10 Daniela Braga, Co-founder & CEO of DefinedCrowd corporation
11:40 Ece Kamar, Researcher at Microsoft Research
12:10 Lunch
14:00 Moderator: Daniela Braga, CEO of DefinedCrowd
Panelists: Ece Kamar (Microsoft Research), Jerome Bellegarda (Apple), Lyle Ungar (Univ. Pennsylvania), Gina-Anne Levow (Univ. of Washington)
14:30 Papers presentation, Chair: Michael Levit
14:30 Crowdsourced Continuous Improvement of Medical Speech Recognition, W.Salloum, E.Edwards, S.Ghaffarzadegan, D.Suendermann-Oeft, and M.Miller
14:50 Beyond Mechanical Turk: Using Techniques from Meta Learning to Compare Crowdsourcing Platforms Across Languages, Sarah Luger
15:10 Crowdsourcing Multimodal Dialog Interactions: Lessons Learned from the HALEF Case, V.Ramanarayanan, D.Suendermann-Oeft, H.Molloy, E.Tsuprun, P.Lange & K.Evanini
15:30 Coffee break
16:00 Complementing the Execution of AI Systems with Human Computation, E.Kamar, L.Manikonda
16:20 Regularization and Learning an Ensemble of RNNs by Decorrelating Representations, M. Yadav, S. Agarwal
16:30 Break
16:45 Poster session. Chair: Ece Kamar
18:30 Closing session


Dissemination of the conference will be made to groups in industry and academia with related work and interest in the field, including participants in previous workshops organized in the scope of HCOMP. We expect at least 100 people.