MLP on Wednesday, July 18th

MLP ON WEDNESDAY, JULY 18TH

Days:

next day

all days

View: session overview talk overview side by side with other conferences

09:00-10:30 Session 124G

Chair:

Bruno Marnette

Location: Blavatnik LT1

09:00

Pushmeet Kohli

TBA

ABSTRACT. In this talk, I will address the questions of
1. how we specify arbitrary tasks to a learning system,
2. how we interpret its behaviour, and finally
3. how do we verify or debug it to ensure that its behaviour is
consistent with the task specification.
I will also describe my initial attempts to make progress on these
questions through program synthesis and verification.

09:45

Earl Barr

TBA

10:30-11:00Coffee Break

11:00-12:30 Session 126G

Chair:

Viktor Kuncak

Location: Blavatnik LT1

11:00

Jules Villard

Static Analysis for Developer Efficiency with Infer

ABSTRACT. Infer is an open-source static analysis tool for Java, C, C++, and Objective-C. Infer has been successfully deployed at Facebook, where it identifies hundreds of potential bugs per month in mobile apps and backend code. Infer uses AI (Abstract Interpretation) and ML (more precisely the OCaml implementation) to analyse source code. This talk will present infer and attempt to draw bridges between infer and other AI/ML techniques.

11:45

Liam Atkinson

Learning to Type

12:30-14:00Lunch Break

14:00-15:30 Session 127G

Chair:

Bruno Marnette

Location: Blavatnik LT1

14:00

Rishabh Singh

Neural Meta Program Synthesis

ABSTRACT. The key to attaining general artificial intelligence is to develop architectures that are capable of learning complex algorithmic behaviors modeled as programs. The ability to learn programs can allow these architectures to learn to compose high-level abstractions that can lead to many benefits: i) enable neural architectures to perform more complex tasks, ii) learn interpretable representations (programs which can be analyzed, debugged, or modified), and iii) better generalization to new inputs (like algorithms). In this talk, I will present some of our recent work in developing neural architectures for learning programs from examples, and also briefly discuss other applications such as program repair and fuzzing that can benefit from such neural program representations.

14:45

Martin Vechev

Learning to Analyze Programs at Scale

ABSTRACT. I will present two new results on machine learning-based program analysis. The first direction involves learning static analyzers from a given dataset of programs and is based on counter-example guided synthesis, decision tree learning and adversarial perturbations. The second direction involves learning rules that pinpoint program issues (e.g., security violations), and is based on learning from large datasets of program changes by using semantic abstractions and hierarchical clustering. In both cases, I will show the methods successfully found issues missed by state-of-the-art, manually crafted systems.

15:30-16:00Coffee Break

16:00-18:00 Session 129F

Chair:

Bruno Marnette

Location: Blavatnik LT1

16:00	Michael Pradel DeepBugs: A Learning Approach to Name-based Bug Detection ABSTRACT. Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This talk presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.
16:45	Ian Wright, Jean Helie and Albert Ziegler Measuring software development productivity: a machine learning approach SPEAKER: Ian Wright ABSTRACT. We apply machine learning to version control data to measure software development productivity. Our models measure both the quantity and quality of produced code. Quantity is defined by a model that predicts the labor hours supplied by the `standard coder’ to make any code change, and quality is defined by a model that predicts the distribution of different kinds of problems identified by a static code analysis tool.
17:15	Ezra Winston, Bhuwan Dhingra, Kathryn Mazaitis, Graham Neubig and William Cohen Answering Cloze-style Software Questions Using Stack Overflow SPEAKER: Ezra Winston ABSTRACT. Modern Question Answering (QA) systems rely on both knowledge bases (KBs) and unstructured text corpora as sources for their answers. KBs, when available, generally offer more precise answers than unstructured text. However, in specialized domains such as software engineering, QA requires deep domain expertise and KBs are often lacking. In this paper we tackle such specialized QA by using both text and semi-structured knowledge, in the form of a corpus of entity-labeled documents. We propose CASE, a hybrid of an RNN language model and an entity co-occurrence model, where the entity co-occurrence model is learned from the entity-labeled corpus. On QUASAR-S, a dataset derived from Stack Overflow consisting of Cloze (fill-in-the-blank) software questions and a corpus of tagged posts, CASE shows large accuracy gains over strong baselines.

19:15-21:30 Workshops dinner at Magdalen College

Workshops dinner at Magdalen College. Drinks reception from 7.15pm, to be seated by 7:45 (pre-booking via FLoC registration system required; guests welcome).

Location: Magdalen College