Text embedding models serve as a fundamental component in real-world search applications. However, their ranking fidelity remains limited compared to dedicated rerankers, especially recent LLM-based listwise rerankers, which capture fine-grained query-document and document-document interactions.
In this paper, we propose a simple yet effective unified framework E2Rank, means Efficient Embedding-based Ranking (also means Embedding-to-Rank), which extends a single text embedding model to perform both high-quality retrieval and listwise reranking through continued training under a listwise ranking objective, thereby achieving strong effectiveness with remarkable efficiency. By applying cosine similarity between the query and document embeddings as a unified ranking function, the listwise ranking prompt, which is constructed from the original query and its candidate documents, serves as an enhanced query enriched with signals from the top-K documents, akin to pseudo-relevance feedback (PRF) in traditional retrieval models. This design preserves the efficiency and representational quality of the base embedding model while significantly improving its reranking performance.
Empirically, E2Rank achieves state-of-the-art results on the BEIR reranking benchmark and demonstrates competitive performance on the reasoning-intensive BRIGHT benchmark, with very low reranking latency. We also show that E2Rank's advanced embedding ability on the MTEB benchmark.
Our work highlights the potential of single embedding models to serve as unified retrieval-reranking engines, offering a practical, efficient, and accurate alternative to complex multi-stage ranking systems.
E2Rank leverages the pseudo relevance feedback signals of the top-K retrieved documents to strengthen query representation. Instead of treating each query-document pair independently, the listwise prompt is constructed by concatenating the query and its candidate documents into a single sequence, allowing the model to capture both query-document and document-document interactions within the candidate set.
During inference, cosine similarity between the pseudo query embedding (generated from the listwise prompt) and each standalone document embedding serves as the ranking function. This formulation keeps the model highly efficient while significantly improving listwise fidelity.
The training of E2Rank follows a two-stage paradigm designed to retain strong retrieval capability while injecting listwise ranking signals into the embedding space.
Stage 1: Training the Embedding with Contrastive Learning
We first train the base embedding model with contrastive learning. This stage ensures strong initial retrieval accuracy and computational efficiency. The model trained in this stage is directly capable of standalone retrieval and serves as the initialization for the unified ranking framework.
Stage 2: Listwise Ranking Enhancement
To further improve the ranking fidelity within the retrieved candidate set, we perform continued training using a learning-to-rank objective, RankNet Loss. Given a query and its top-K retrieved documents, we construct a listwise prompt by concatenating them as pseudo relevance feedback query. During training, we optimize cosine similarity between this enhanced query embedding and each candidate document embedding to align with ground-truth relevance labels. Contrastive learning is also used in this stage. This stage injects rich ranking knowledge into the embedding model, enabling it to serve not only as a retriever but also as an effective and efficient listwise reranker.
As a result, E2Rank unifies retrieval and reranking within a single model.
Effectiveness. As shown in Figure (b), E2Rank achieves state-of-the-art performance on the BEIR benchmark. It consistently RankQwen3 and approaches, demonstrateing that a single embedding model, when properly trained, can perform high-quality listwise reranking without relying on heavyweight generative architectures.
Additionally, E2Rank delivers strong results on the reasoning-intensive BRIGHT benchmark, without any RL training and training data synthesis, showcasing its versatility across diverse ranking scenarios.
Efficiency. E2Rank inherits the advantages of the embedding model, supports batch inference, and can encode document embeddings offline, further reducing online reranking latency. As shown in Figure (c), E2Rank significantly reduces inference latency across all model sizes compared to RankQwen3, achieving up to about 5x speedup at 8B while maintaining superior ranking performance. Even E2Rank-8B model is faster than RankQwen3-0.6B.
Beyond reranking gains, on the MTEB benchmark, E2Rank demonstrates strong embedding capabilities, while E2Rank-8B shows slight performance advantages on average compared to previous advanced models. Notably, compared with the variant with only contrastive learning, distilling from richer ranking signals will bring consistent and significant enhancements in retrieval tasks , demonstrating the effectiveness of the ranking objective.