5 Years Impact Factor: 1.53
Author: Kang Leng Chiew
Abstract:
The rise of attention mechanisms has significantly advanced the state of the art in various natural language processing (NLP) tasks, particularly in machine translation, question answering, and text summarization. This survey explores the conceptual evolution, implementation strategies, and comparative performance of attention-based architectures up to 2018. We categorize attention into global, local, self-attention, and hierarchical attention mechanisms, analyzing their integration into encoder-decoder models, recurrent neural networks (RNNs), and more recently, non-recurrent frameworks such as the Transformer. Self-attention, which allows models to weigh relationships between tokens in a sequence regardless of their distance, is shown to offer both computational efficiency and improved long-range dependency handling. The Transformer model, introduced in 2017, marks a paradigm shift by eliminating recurrence altogether while delivering superior performance in tasks like tran
Download PDF