Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings

Ignacio Sastre; Aiala Rosá

doi:10.18653/v1/2025.l2m2-1.14

ScienceGate Book Chapters

JOURNAL ARTICLE

Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings

Ignacio Sastre Aiala Rosá

Year: 2025 Pages: 183-189

DOI: 10.18653/v1/2025.l2m2-1.14

Get Full-Text PDF Get Analytical Report

Abstract

In this work, we observe an interesting phenomenon: it is possible to generate reversible sentence embeddings that allow an LLM to reconstruct the original text exactly, without modifying the model's weights. This is achieved by introducing a special memory token, whose embedding is optimized through training on a fixed sequence. When prompted with this embedding, the model reconstructs the fixed sequence exactly. We evaluate this phenomenon across English and Spanish datasets, sequences of up to approximately 240 tokens, and model scales ranging from 100M to 8B parameters. Notably, Llama 3.1 8B successfully reconstructs all tested sequences. Our findings highlight an interesting capability of LLMs and suggest potential applications in memory-based retrieval, compression, and controlled text generation.

Keywords:

Computer science Natural language processing Sentence Language model Artificial intelligence

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.12

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Text Readability and Simplification

Physical Sciences → Computer Science → Artificial Intelligence

Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings

Abstract

Metrics

Topics

Related Documents

Scaling Sentence Embeddings with Large Language Models

Tokens, embeddings and prompts: welcome to the world of large language models

Tokens, embeddings and prompts: welcome to the world of large language models

Tokens, embeddings and prompts: welcome to the world of large language models

From Sentence Embeddings to Large Language Models to Detect and Understand Wordplay