Google’s new SMITH algorithm (and how it outperforms BERT)
Google has a new search engine algorithm, SMITH. And according to Google, it is outperforming Google BERT in understanding long-form queries and content.
It remains a mystery whether or not Google is using the SMITH algorithm. It is important to note that Google rarely says which specific algorithms it is using at a given time. Therefore, Google may or may not be using SMITH as of now.
That, however, does not diminish its value and the need for understanding how this algorithm works. In my opinion, this gives a fascinating insight into the direction Google is moving as a search engine and how it sees the future of online content and content consumption.
What is SMITH?
Put simply, SMITH or Siamese Multidepth Transformer-based Hierarchical Encoder is a new search engine algorithm by Google that focuses on understanding long-form documents. More specifically, SMITH is particularly good in understanding the context of certain passages within long-form content.
How is SMITH different from BERT?
SMITH and BERT appear to be related, and SMITH seems like an extension of BERT.
While SMITH deals with understanding passages within the context of documents, BERT is trained to understand words within the context of sentences.
When it comes to understanding long-form content, BERT has limitations that SMITH does not.
According to a research whitepaper by Google:
“In recent years, self-attention based models like Transformers… and BERT …have achieved state-of-the-art performance in the task of text matching. These models, however, are still limited to short text like a few sentences or one paragraph due to the quadratic computational complexity of self-attention with respect to input text length.
In this paper, we address the issue by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for long-form document matching. Our model contains several innovations to adapt self-attention models for longer text input.”
The whitepaper also explains why understanding long documents could be more difficult:
“Semantic matching between long texts is a more challenging task due to a few reasons:
- When both texts are long, matching them requires a more thorough understanding of semantic relations including matching pattern between text fragments with long distance;
- Long documents contain internal structure like sections, passages and sentences. For human readers, document structure usually plays a key role for content understanding. Similarly, a model also needs to take document structure information into account for better document matching performance;
- The processing of long texts is more likely to trigger practical issues like out of TPU/GPU memories without careful model design.”
The results
BERT is limited when it comes to understanding longer documents. On the other hand, SMITH performs better the longer the document is. According to the whitepaper:
“Experimental results on several benchmark data for long-form text matching… show that our proposed SMITH model outperforms the previous state-of-the-art models and increases the maximum input text length from 512 to 2048 when comparing with BERT based baselines.”
“Our experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention…, multi-depth attention-based hierarchical recurrent neural network…, and BERT.
Comparing to BERT based baselines, our model is able to increase maximum input text length from 512 to 2048.”
Pre-training and content blocks
Pre-training algorithm is a tried and tested method that not only produces excellent results but also helps the algorithm mature over time and make fewer mistakes.
In pre-training, random words are hidden in a sentence, and the algorithm predicts the hidden words. The algorithm keeps learning and, eventually, makes fewer mistakes.
“Inspired by the recent success of language model pre-training methods like BERT, SMITH also adopts the “unsupervised pre-training + fine-tuning” paradigm for the model training.
For the SMITH model pre-training, we propose the masked sentence block language modeling task in addition to the original masked word language modeling task used in BERT for long text inputs.”
In the case of SMITH, blocks of sentences are hidden in pre-training. This is a key part of SMITh and how it operates.
“When the input text becomes long, both relations between words in a sentence block and relations between sentence blocks within a document becomes important for content understanding. Therefore, we mask both randomly selected words and sentence blocks during model pre-training.”
Conclusion
It is important to note that SMITH does not replace BERT. Instead, SMITH supplements BERT by doing what BERT is unable to do.
If you want to learn more about SMITH, you can read the original research paper here.