beam-search tensor Attention-UNet
beam-search based DeepSpeed implementation for weights quantized.
- Input
- 1215-dim embedding
- Encoder
- 64 x Attention-UNet with 30 heads
- Output
- accuracy projection
Training config
optimizer=Adam, lr=0.801, scheduler=cyclic, warmup=405