{"version": "1.0", "type": "rich", "title": "Basically there appears to be a missing level of complexity with these image generators as they probably see these images as 2D...", "author_name": "kontextmaschine", "author_url": "https://kontextmaschine.com", "provider_name": "kontextmaschine", "provider_url": "https://kontextmaschine.com", "url": "https://kontextmaschine.com/post/182652149443/", "html": "<p><a href=\"https://mitigatedchaos.tumblr.com/post/182652004972/basically-there-appears-to-be-a-missing-level-of\" class=\"tumblr_blog\" target=\"_blank\">mitigatedchaos</a>:</p>\n\n<blockquote><p>Basically there appears to be a missing level of complexity with these image generators as they probably see these images as 2D pixel grids and not as projections of 3 dimensional space onto a 2d surface, and don\u2019t engage in spatial reasoning about the 3D layout of a cat and the surrounding scene.</p><p>We can think of it as having developed a bunch of alien paint tools without understanding the underlying theory.</p><p>Can one of these models learn that?  Well, more importantly, what\u2019s the factor difference in required resources?</p></blockquote>\n\n<p>has anyone (publicly) tried ML training on stereoscopic image pairs, or Magic Eye-type shit?</p>"}