Docs
Official Website
  • Metakraft Documentation
  • Getting Started
    • 📖Disclaimer
    • 🚄Our Journey
    • 🦹About
  • How Metakraft AI works
    • 📑Market and Trend Analysis
      • Overview of the current trend and market
      • Existing Gaps and Opportunities in the Market for Metakraft AI
      • The Evolution of 3D Experiences and Technology
    • 🔢Creative Layer: All into Games
      • ⚒️EDK
      • 🤖AIGC
        • 🍧Asset Generation
          • Create first Model
          • Block Models
        • 🏃‍♂️Text-to-Animation
          • Components
        • 🧑‍💼Character Generation
          • Creating Avatars & Animations
        • 🎮InGame UGC - API
          • Generating API Key
          • Creating a Basic Scene
        • ⚙️Custom Models & Tools
      • 🔽IP Management
        • 🔼Launching your IP's
      • 🆔Game ID
      • ⌛KRAFT Protocol
    • 🃏Marketplace
    • 🥽Immersive Media - Games and XR Systems
      • 🛸XR Systems
        • 🥽Head-Mounted Display (HMD)
        • 🖲️Tracking Systems
        • ⚙️XR Runtime
          • 🔮OpenXR Framework
      • 🖍️Game Design Ecosystem
        • Game Engines
        • Spatial Audio
        • Web3.0 SDK
    • 🤖Framework & Compatibility
      • 🛸XR Systems
        • XR Runtime
      • 🖍️Game Design Ecosystem
        • Web3.0 SDK
          • Wallet
          • Identity
  • Token Economy
    • 🚀Tokenomics & Utilities
    • 🪙About $KRFT Token
    • 🦹Usecases for Token
  • Roadmap
    • 🚀Roadmap
  • How to Join
    • 🚂SPX Nodes
      • Node Rewards and Benefits
      • Technical Requirements
      • Node-as-a-Service Partnerships
      • Partnering with Third-Party Services
      • Delegating Node Operations
      • Security and Compliance
      • Buyback Program
    • 👾Creator & Ambassador Program
  • Conclusion
    • 🤞Conclusion
    • 🐟Disclaimer
Powered by GitBook
On this page
  1. How Metakraft AI works
  2. Creative Layer: All into Games
  3. AIGC

Asset Generation

PreviousAIGCNextCreate first Model

Last updated 9 months ago

High-quality object meshes are essential for various use cases in movies, gaming, e-commerce, and AR/VR. In this work, we tackle the problem of generating high-quality 3D object mesh from a single image. This is a ill-posed and challenging problem as this requires reasoning about the object’s 3D shape and texture from only a single 2D projection (image) of that object. Single-image object generation can simplify the tedious and manual object creation process

We achieve text-to-3D generation by utilizing a pretrained text-to-image diffusion model ε as an image prior to optimize the 3D representation parameterized by θ. The image x = g(θ), rendered at random viewpoints by a volumetric renderer, is expected to rep-resent a sample drawn from the text-conditioned image distribution p(x y) modeled by a pretrained diffusion model. The diffusion model φ is trained to predict the sampled noise ε (x ;y, t) of the noisy image x at the noise level t, conditioned on the text prompt y. A score distillation sam-pling (SDS) loss encourages the rendered images to match the distribution modeled by the diffusion model. Specifically, the SDS loss computes the gradient:

This is the per-pixel difference between the predicted and the added noise upon the rendered image, where ω(t) is the weighting function.

One way to improve the generation quality of a conditional diffusion model is to use the classifier-free guidance (CFG) technique to steer the sampling slightly away from the unconditional sampling, i.e., ε (x ;y, t)+ε (x ;y, t) ε (x , t,∅), where ∅ represents the “empty” text prompt.

Text-to-image synthesis has been explored via models like Imagen [22], DALL-E 3 [23]. We have used diffusion models for text-to-image tasks as diffusion models tend to be faster. Many text-to-image models also enable users control over their picture by prompting the model to change specific regions of an image via text, known as inpainting.

However, inpainting for these models is often limited to text. Our method first leverages a state-of-the-art text-to-image generative model to generate a high-quality 2D image from a text prompt. In this way, we can leverage the full power of state-of-the-art 2D diffusion models to depict intricate visual semantics described in the text, retaining the creative freedom as 2D models.

We then lift this image to 3D through cascaded stages of geometric sculpting and texture boosting. By decomposing the problem, we can apply specialized techniques at each stage. For geometry, we prioritize multi-view consistency and global 3D structure, allowing for some compromise on detailed textures. With the geometry fixed, we then focus solely on optimizing realistic and coherent texture, for which we jointly learn a 3D-aware diffusion prior that bootstraps the 3D optimization. In the next, we elaborate on key design considerations for the two phases.

🔢
🤖
🍧
Fig. Depicts the flow, Metakraft Converts a Text to 3D model