Natural Language Processing PCPR Pun Paper Presentation

Natural Language Processing Pun Presentation Banner with food pun images set in a cartoon YouTube video format, Tagged Courses

The sheer chaos and complexity of human language system and multi-layered multi-dimensional communication methods boggles my mind all the time, especially with respect to the nature of internet culture. People use specific intonations to denote sarcasm, facial expressions to convey the frivolity of a joke, and baaarely contained glee over a good-bad pun. How are we to get machines on that level, on that wavelength / frequency with us?

I would write an ode to the TikTok community specifically my For You page, my side of the world, my hyper specific relatable niche interests given loops of endless voices, characters, stories, discourse. I have so much love, hope, well-wishes for these specific personas the para-social commentary borne out of self-discovery, empathy, and shared collective trauma processing – all educational and entertaining too. YouTube ushered in the era of entertaining education but TikTok has a unique X-factor specific to the current hyper-aware generation, given the pandemic circumstances, and the general state of the world, but best of all – quick cited information. It also quells my knowledge FOMO (Fear of Missing Out).

From @jajwalyark Instagram Highlight Musicals

The range of linguistic developments with the art of meme-ing, inside reference jokes, categorization of a range of emotions from one phrase, image, gif, sound… fascinates me. From the days of Tumblr to Vine to TikTok now, the possibilities of layered interactions are endless. Tumblr posts went to Facebook or Instagram to die, and now TikTok videos go to Instagram or Twitter for their second life; the latter of which are also dying out.

I imagine the internet pop culture as a beast carrying the 1.3 quintillions of data over its shoulders the way the giant Atlas would, like a shroud and slowly crumbling into new civilizations and religious orders a la one of the American Gods.

So when I found this paper on machines recognizing puns using the pronunciation in my AIT 590 coursework for the teams to present on, I immediately emailed the professor about it, even before my team was finalized, even though the first six presentations would receive extra credit and this paper was slotted to the end. Sigh.

Here are some of my talking points and notes from the research into the related material and the paper itself. I adore making presentations on Focusky; not only do I get to focus on the substance of the paper in my own words (Nothing in a classroom gets me more zoned out than presentations that are simply words off the screen.) I can wow the viewers even through video conference calls with the visual effects of physically moving across large swathes of knowledge in easily digestible form. It has not failed me yet.

“The Boating Store Had Its Best Sail Ever’: Pronunciation-attentive Contextualized Pun Recognition.”


Heterographic puns are made of phonologically same or similar words.

Homographic puns rely on multiple interpretations of the same expression; contextually visual


  • Pronunciation-attentive Contextualized Pun Recognition (PCPR)
  • Pun Detection and Location
  • Words ~ Context + Corresponding phonetic symbols.
  • 2 benchmark datasets.
  • Significantly outperforms the state-of-the-art methods in pun detection and location tasks.
  • In-depth analyses verify the effectiveness and robustness of PCPR.


Yichao Zhou, Ph.D Candidate (Information & Data Management), University of California, Los Angeles.
Jyun-Yu Jiang, Ph.D. Candidate (Computer Science), University of California, Los Angeles.
Jieyu Zhao, Ph.D. Candidate (Computer Science), University of California, Los Angeles.
Kai-Wei Chang, Assistant Professor, Computer Science, University of California, Los Angeles.
Wei Wang, Professor at Department of Computational Medicine, University of California at Los Angeles

Previous Work

Pun Recognition and Generation

  • Deploying word sense disambiguation methods
  • Using external knowledge base
  • Excludes Heterographic puns
  • No pre-trained embedding model (Pedersen, 2017; Oele and Evang, 2017)
  • Leveraging static word embedding techniques
    • Excludes contextual puns (Hurtado et al., 2017; Indurthi and Oota, 2017; Cai et al., 2018).

Future Possibilities

  • Apply the proposed model to other problems:
  • General humor recognition
  • Irony discovery
  • Sarcasm detection

Word Embedding Process

  • Input text = Sequence of N words
    For each word with M phonemes
  • Phonemes for “pun” are {P, AH, N}
  • Pun detection is a sentence binary classification problem.
  • Pun location can be modeled as a sequential tagging task.
  • Assigning a binary label to each word.

Publicly available benchmark datasets

SemEval 2017 shared task 7: PCPR > 87%-90% Accuracy

An ongoing series of evaluations of computational semantic analysis systems
SIGLEX (Special Interest Group on the Lexicon of the Association for Computational Linguistics)

By applying the pronunciation-attentive representations, different words with similar pronunciations are linked, leading to a much better pinpoint of pun word for the heterographic dataset.

“Improve word embedding using both writing and pronunciation.”

Wenhao Zhu et al., PloS one, 2018

Pun of the Day (PTD): PCPR > 98% Accuracy

Reveal those contradictions of meanings
Phonetical embeddings can be intuitively useful to recognize identically pronounced words

  1. BERT to derive contextualized word embeddings without loss of generality.
  2. Apply the attention mechanism to
    a) Identify important phonemes.
    b) Derive the pronunciation embedding for each word.
  3. Capture the overall representation for each word
    Self-attentive encoder = Contextualized Word Embeddings + Pronunciation Embeddings

Ablation Study and Analysis

Parts of the deep neural network are removed, in order to gain a better understanding of the network’s behaviour.
PCPR dramatically improves the pun location and detection performance, compared to the SOTA models, Joint and CPR.

Visualization of attention weights of each pun word (marked in pink) in the sentences.
A deeper color indicates a higher attention weight.


Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv:1810.04805 [cs], May 2019, Accessed: Nov. 04, 2020. [Online]. Available:

E. Fosler-Lussier, W. Byrne, and D. Jurafsky, eds. 2005. Speech Communication Special Issue on Pronunciation Modeling and Lexicon Adaptation, 46:2, June 2005.

Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall.

Zhou, Yichao and Jiang, Jyun-Yu and Zhao, Jieyu and Chang, Kai-Wei and Wang, Wei. “‘The Boating Store Had Its Best Sail Ever’: Pronunciation-attentive Contextualized Pun Recognition.” (accessed Nov. 02, 2020).

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.