US2020410337(AMAZON TECH INC [US])
[0149] The above described techniques can be applied to any tensor operations or any operations that include matrix multiplications, such as operations of a multi-layer perceptron described above with respect to FIG. 1. In one example, as an alternative to convolutions, a Transformer for natural language processing (NLP) may encode each position and apply an attention mechanism to relate two distant words, which can be parallelized to accelerate the training. The attention mechanism in the Transformer is a way of computing the relevance of a set of values (e.g., information) based on some keys and queries. The attention mechanism can be used by the Transformer to focus on relevant information based on what it is currently processing. The attention weights may represent the relevance of the encoder hidden states (e.g., values) in processing the decoder state (e.g., queries) and may be calculated based on the encoder hidden states (e.g., keys) and the decoder hidden state (e.g., queries). A Transformer can reduce the number of sequential operations to relate two symbols from input/output sequences to a constant O(1) number of operations by using a multi-head attention mechanism that can model dependencies regardless of their distance in an input or output sentence.
WO2022241190(H LEE MOFFITT CANCER CT & RES [US])
[0050] Transformer-based machine learning models are deep learning models commonly used in the field of natural language processing (NLP). Transformer-based machine learning models have an encoder-decoder architecture, where a plurality of encoder layers iteratively process the input layer-by-layer and a plurality of decoder layers iteratively process the output layer-by-layer. Each encoder and decoder layer also includes an attention unit (e.g., scaled dot-product) that weights the relevance of the layer inputs. This disclosure contemplates that the attention unit can be implemented using a computing device (e.g., a processing unit and memory as described herein). Additionally, each encoder and decoder layer includes an artificial neural network. An example transformer-based machine learning model for NLP is the Bidirectional Encoder Representations from Transformers (BERT) developed by Google LLC of Mountain View, California. It should be understood that BERT is provided only as an example. This disclosure contemplates that the transformer-based machine learning models described herein can be models other than BERT.
US11238240(DIGITAL ASSET CAPITAL INC [US])
[0068] Some embodiments may generate and use a set of attention values to perform one or more NLP operations to extract or categorize the text of a natural-language-text document, where the attention values may be used to weigh or otherwise modify an output of a neural network. Various methods may be used to determine or use attention values. For example, some embodiments may use a multi-headed attention-based autoencoder, such as autoencoders using a model similar to those described by Vaswani et al. (Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention is all you need.” In Advances in neural information processing systems, pp. 5998-6008. 2017, arXiv:1706.03762) or Devlin et al. (Devlin, J., Chang, M. W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv: 1810.04805), which are incorporated by reference in their entirety.
※コメント投稿者のブログIDはブログ作成者のみに通知されます