Add batching support for Marian language models

**Describe the bug**
Currently, Marian models are limited to single-sequence inference. Enabling batching is critical for high-performance translation scenarios (e.g., Edge Translate), offering significant throughput improvements.

Other Models (Llama, Phi, GPT, etc.)
- **Status**: Fully Supported.
- **Details**: 
  - Models support both continuous and static batching.
  - `State` and `Model` implementations handle `[batch_size, seq_len]` inputs natively.
  - KV caching is managed per-sequence.
  - Search strategies (Greedy, Beam) operate efficiently on the full batch.

Marian Models (Current Main Branch)
- **Status**: Single Sequence Only (`batch_size = 1`).
- **Limitations**:
  - `MarianState::Run` assumes single-sequence inputs.
  - Encoder/Decoder logic does not account for padding or batch dimensions.
  - Attempting to run with `batch_size > 1` results in shape mismatches or incorrect processing of padding tokens.

**Desktop (please complete the following information):**
 - OS: Win11
 - Browser Edge
 - Version 144

**Additional context**
Testing batching performance using python script showed significant performance improvement for batch processing vs sequential translation.

#### Windows Results

| Metric | Batch | Sequential | Difference |
|--------|-------|------------|------------|
| Total Time | 1.09s | 4.76s | **4.38x slower** |
| Avg per Text | 0.011s | 0.049s | - |
| Time Wasted | - | 3.68s | 77.2% overhead |

#### Linux (WSL) Results

| Metric | Batch | Sequential | Difference |
|--------|-------|------------|------------|
| Total Time | 1.06s | 5.39s | **5.08x slower** |
| Avg per Text | 0.011s | 0.056s | - |
| Time Wasted | - | 4.33s | 80.3% overhead |

[ort_model_timing.py](https://github.com/user-attachments/files/23780152/ort_model_timing.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add batching support for Marian language models #1897

Windows Results

Linux (WSL) Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Batch	Sequential	Difference
Total Time	1.09s	4.76s	4.38x slower
Avg per Text	0.011s	0.049s	-
Time Wasted	-	3.68s	77.2% overhead

Metric	Batch	Sequential	Difference
Total Time	1.06s	5.39s	5.08x slower
Avg per Text	0.011s	0.056s	-
Time Wasted	-	4.33s	80.3% overhead

Add batching support for Marian language models #1897

Description

Windows Results

Linux (WSL) Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions