A numerical approach for solving gas dynamics on Cartesian grids is considered which employs an implicit time marching scheme with the matrix-free Lower-Upper Symmetric Gauss–Seidel (LU-SGS) method for solving discrete equations. Boundary conditions are treated with an embedded-boundary method. The method has two attractive features—(1) algorithmic uniformity of calculations and (2) structured memory accesses that well fit massively parallel architectures with GPU accelerators. We propose a novel CUDA+MPI computational algorithm scalable up to hundreds of GPUs and give in-depth analysis of its implementation (interoperability issues, libraries tuning).