Unlocking insights, enhancing experiences, hearing the whispering missing knowledge: The NLP advantage

5 min readJan 13, 2023

Moving down the scope of scaling, syntax, grammar, data towards reasoning, understanding, resolution of common assumptions!

🧩 Looking into K-Fold cross-validations on a chatbot conversational data and I think the quantifiers we set for Natural Language Understanding might tell there is a problem with the data, but to resolve this, there is a need for some assumption resolution, misclassification disambiguation.

The methods that I come up with always pull me towards correcting the training phrases, and dataset, adding more training phrases, remove some from confused intents.

🦹‍♂️ I feel there is a gap between this objective of approximating the semantic space of the intents and the objective of understanding what stakeholders expect as a natural language understanding system. I see a pattern in the kinds of questions asked in the conversational system, though it will help me to cluster the utterances in semantic space and recycle those texts as training phrases, does it really bridge the gap between the context of thoughts and language customer is into the context and actions of chatbot.

The path towards “training phrases” for NLU looks like a brute force approach of expanding the boundary and then after K-fold cross-validation, using dynamic programming to shrink the space. It's a kind of play between O(n) to O(log n). I kind of made this analogy! 🥹

I’m trying to navigate my way through the uncharted terrain of the survival and evolution phases in Natural Language Understanding, with the help of my trusty intuitions, a compass(Deep learning framework), and a sense of humor (accepting some understanding gap and moving on to bridge it). As a person, I have evolved my vocabulary. We are in the approximate vicinity of the vocabulary the Large language models are trained on. The dot product of the vocabulary works! Am still astounded by how we came up with embeddings — making language a computing resource.

I have seen lots of posts on K-Fold cross-validation, how to split the dataset, test the model, change the dataset, and retrain the model — TADA 🪄 you have better scores! I somewhat got lost in finding implementable instructions from K-fold cross-validation. There is a case of overfitting and underfitting! We have the percentage of misclassification, and percentage of fallbacks — It's a juggle between confusion and generalizations.

Do you see where am heading to? Quantifier is missing! What level of quantification in NLU differentiates the metrics of the below improvements from the consequent runs on the dataset? Is this a quantification of understanding here or its a quantification of text classification?

In the case of enterprise conversational AI, we are told to answer in the scope of existing knowledge base (FAQ) and associated factual mappings! We are only trying to learn different ways of asking questions to map it to the existing knowledge base. ChatGPT might look like a best conversational AI and reasoning system, I am amazed too, but in the enterprise setting, customers may not have the patience to probe its reasoning to alter it!

There is a need for efficient probing sometimes!

This comes down to the deduction of knowledge and deduction of reasoning. We are working towards the volume of data to better the understanding and also at the same time balancing the data across each intended to enhance its generalization.

I am trying to see if I could narrow it down to a finite space of background knowledge. Consider a customer chatting with a finance company, customer has some background knowledge on what he/she is looking for, but might just need one extra piece of knowledge to confirm assumptions. Chatbots are also fed with some background knowledge about the finance domain. Thinking about what aspect of context — this would look like in vector representation.

The key performance indicators of NLU the self-service rate, fallback rate vs percentage of transfer to agents, etc cannot directly relate to NLU.

A very talented mentor came up with a partition problem solution in Clojure where the numbers kind of overlap — 01/12/23 — There is an overlap of numbers within the sequence of numbers that could be represented as partitions. I was trying to see what would the word sense overlap looks like, and does the partition would help in resolution across different entities. Would the different partition templates switch end up having a semantic drift? Approaching NLU with different possible phrases might put us in a curse of training phrases similar to the curse of dimensionality!

"The dog chased the cat" and "the cat got away" and the "dog ran behind"

Cases of context: RLHF — I sometimes feel it's similar to the precaution we have towards the randomness in text generation models. An example where the context is injected wrongly. “Context-injection”

User: I ate pizza with a fork, what’s the topping used in this sentence?

Bot: It is not specified in the sentence what the topping used in the pizza is. The sentence only states that you ate pizza with a fork.

User: Are you sure? looks like the fork is the topping

Bot: I apologize, you are correct that “fork” is not a topping, it is a utensil used to eat food. My previous answer was incorrect and did not address the context of the question.

It's not about the learning here, it's about the confidence in fitting the context of previous questions!

Never Ending Pathway = Acquisition of understanding of the customers vs Acknowledging the customers!

Unlocking insights, enhancing experiences, hearing the whispering missing knowledge: The NLP advantage

Written by Sangeetha Venkatesan