Question-Answering with BERT

The holy grail of NLP ! Here is an Ai which is able to extract answers from a text when asked a specific question from the text.

Contexts :

  1. the long-awaited takeover of universal pictures by mca , inc. happened in mid-1962 as part of the mca-decca records merger . the company reverted in name to universal pictures . as a final gesture before leaving the talent agency business , virtually every mca client was signed to a universal contract . in 1964 mca formed universal city studios , inc. , merging the motion pictures and television arms of universal pictures company and revue productions ( officially renamed as universal television in 1966 ) . and so , with mca in charge , universal became a full-blown , a-film movie studio , with leading actors and directors under contract ; offering slick , commercial films ; and a studio tour subsidiary launched in 1964 . television production made up much of the studio 's output , with universal heavily committed , in particular , to deals with nbc ( which later merged with universal to form nbc universal ; see below ) providing up to half of all prime time shows for several seasons . an innovation during this period championed by universal was the made-for-television movie .

  2. in the first volume of his parerga and paralipomena , schopenhauer wrote his " sketch of a history of the doctrine of the ideal and the real " . he defined the ideal as being mental pictures that constitute subjective knowledge . the ideal , for him , is what can be attributed to our own minds . the images in our head are what comprise the ideal . schopenhauer emphasized that we are restricted to our own consciousness . the world that appears is only a representation or mental picture of objects . we directly and immediately know only representations . all objects that are external to the mind are known indirectly through the mediation of our mind . he offered a history of the concept of the " ideal " as " ideational " or " existing in the mind as an image " .

  3. this large collection of meteorites allows a better understanding of the abundance of meteorite types in the solar system and how meteorites relate to asteroids and comets . new types of meteorites and rare meteorites have been found . among these are pieces blasted off the moon , and probably mars , by impacts . these specimens , particularly alh84001 discovered by ansmet , are at the center of the controversy about possible evidence of microbial life on mars . because meteorites in space absorb and record cosmic radiation , the time elapsed since the meteorite hit the earth can be determined from laboratory studies . the elapsed time since fall , or terrestrial residence age , of a meteorite represents more information that might be useful in environmental studies of antarctic ice sheets .

  4. policing in greater london , with the exception of the city of london , is provided by the metropolitan police service , overseen by the mayor through the mayor 's office for policing and crime ( mopac ) . the city of london has its own police force – the city of london police . the british transport police are responsible for police services on national rail , london underground , docklands light railway and tramlink services . a fourth police force in london , the ministry of defence police , do not generally become involved with policing the general public .

  5. argentine activists told a news conference that they would not try to snuff out the torch 's flame as demonstrators had in paris and london . " i want to announce that we will not put out the olympic torch , " said pro-tibet activist jorge carcavallo . " we 'll be carrying out surprise actions throughout the city of buenos aires , but all of these will be peaceful . " among other activities , protesters organized an alternative march that went from the obelisk to the city hall , featuring their own " human rights torch . " a giant banner reading " free tibet " was also displayed on the torch route . according to a representative from the ngo 'human rights torch relay ' , their objective was to " show the contradiction between the olympic games and the presence of widespread human rights violations in china "

  6. burrel 's innovative design , which combined the low production cost of an apple ii with the computing power of lisa 's cpu , the motorola 68k , received the attention of steve jobs , co-founder of apple . realizing that the macintosh was more marketable than the lisa , he began to focus his attention on the project . raskin left the team in 1981 over a personality conflict with jobs . team member andy hertzfeld said that the final macintosh design is closer to jobs ' ideas than raskin 's . after hearing of the pioneering gui technology being developed at xerox parc , jobs had negotiated a visit to see the xerox alto computer and its smalltalk development tools in exchange for apple stock options . the lisa and macintosh user interfaces were influenced by technology seen at xerox parc and were combined with the macintosh group 's own ideas . jobs also commissioned industrial designer hartmut esslinger to work on the macintosh line , resulting in the " snow white " design language ; although it came too late for the earliest macs , it was implemented in most other mid- to late-1980s apple computers . however , jobs ' leadership at the macintosh project did not last ; after an internal power struggle with new ceo john sculley , jobs resigned from apple in 1985 . he went on to found next , another computer company targeting the education market , and did not return until 1997 , when apple acquired next .

Questions :

  1. when did universal inaugurate its studio tour subsidiary ?

  2. what did schopenhauer believe we were restricted to ?

  3. about what nearby planet could the antarctic meteorites hold data ?

  4. what department runs the metropolitan police service ?

  5. what is the name of the activist who promised peaceful protests ?

  6. whose technology influenced the user interfaces of the lisa and macintosh ?

My predicted answers :

  1. '1964',

  2. 'our own consciousness',

  3. 'mars',

  4. "mayor's office for policing and crime",

  5. 'jorge carcavallo',

  6. 'steve jobs'

High level steps are :

  • I implement and train a (distil)BERT model for Question and Answering on a subset of the SQuAD v2.0 dataset.

  • Convert the data to tensors using the BERT tokenizer

  • Train a model for Question-Answering by tuning on top of a pre-trained BERT model

Detailed Steps :

  1. Convert data to BERT tensors

  2. Write a function to create question and answer spans

  3. Build a QAdataset Dataloader.

  4. BERT Training

  5. Evaluate on dev

  6. Test on test

Results :

I was able to achieve an accuracy for start spans of 62% and an accuracy of end spans of 66%

and a test accuracy of 51.3%

Source Code here : https://github.com/shounakds2019/QAwithBERT

