3 years after the start of Image and Skybound's Energon Universe, G.I. Joe finally looks ready to recruit one of the Transformers into their ranks.
Abstract: Text-to-image (T2I) generation has been actively studied using Diffusion Models and Autoregressive Models. Recently, Masked Generative Transformers have gained attention as an alternative to ...
Abstract: Recent research on generating 3D assets from single-view inputs by diffusion models has attracted great attention. However, existing diffusion models face challenges such as lack of 3D ...
A fundamental challenge for GUI agents is robustly grounding natural language instructions, which requires not only precise spatial alignment (locating elements accurately) but also correct semantic ...