Numpy is not bad for what it does and has brought array programming into the main stream while relieving the world from Matlab …
There are other broadcasting models which are more consistent, e.g., as in modern APL or J. Based on the idea of verb rank and the leading axis model this provides a (more?) powerful notation for expressing many array operations concisely. For a proof of concept – based on stack and without further optimizations – you can try my JJ.jl:
julia> using JJ
julia> A = rand(5, 5, 100); # Note: Switched order to trailing axis model as Julia is column-major
julia> y = rand(5, 100);
julia> rank"2 \ 1"(A, y) |> size
(5, 100)
Here, rank"2 \ 1"
indicates that \
now applies to matrices as left argument and vectors as right argument. If a batch is given it simply brodcasts over that. A similar effect can be achieved with jax.vmap and allows to write code for low dimensional input arrays only and add support for batches simply when calling it. An advantage over notations with named indices, e.g., using Einsum like conventions, is that it is easier to generalize and applies equally to multiple batch dimensions:
julia> rank"2 \ 1"(rand(5, 5, 10, 8), rand(5, 10, 8)) |> size
(5, 10, 8)
Imho, the other – made up – example of D_{kn} = \frac{1}{L \cdot M} \sum_{l,m} A_{klm} B_{ln} C_{km} will probably always be easier to read (and write) with named indices as the order of the axis is more or less arbitrary. In practice, a natural order is often available, e.g., distinguishing between batch, time and embedding dimensions, and keeping the order consistent often makes it easier to express, e.g., using rank notation or vmap, to which subset of data each computation applies.
For what its worth, here is a J line for the above using the provided axis order:
(+/%#)@,/"3 A * B */"1/~ C
A more consistent ordering might keep the batch dimensions K, N strictly before or after the “compute” dimensions L,M and vamp
a function that is defined for L\times M matrices and L and M vectors only over those.