⚠️ Attention: Please use CFG = 11 as default for testing.For full release note: https://nieta-art.feishu.cn/wiki/PpwqwVDzjiNE5kkUhRtcEsn6nmhI. OverviewIntroducing Neta Art XL V1.0, the easiest-to-use SDXL Anime model so far.Keywords: Best Character Coverage, Vivid storytelling, Diverse styles, Stable anatomy.Major motivation:Better stability and anatomy for character visual storytelling purpose:Ordered prompt guide for model to easier follow prompts;A very good balance between better knowledge and stability.Maintain a high ceiling standard for aesthetics across versatile anime art styles, while keeping the baseline of output appealing for general users.Less loras for characters / styles / artists, so we make better use of static model acceleration techniques.Characters Coverage - refer to both A3.1 lists and release note.Prompting GuideTo avoid possible ambiguity in text prompt, and leave room for very complicated scene such as multi-character, we found enforcing an ordering in prompts leads to better instruct-following behaviors (Learn from NAI3 / Animagine3 / AIDXL). Specifically, we use the following order in Neta Art XL:Tag Order: subject (1boy / 1girl) -> character (a girl named frieren from sousou no frieren series) -> Artist trigger (by xxx) -> race (elf) -> composition (cowboy shot) -> style (impasto style) -> theme (fantasy theme) -> main environment (in the forest, at day) -> background (gradient background) -> action (sitting on ground) -> expression (is expressionless) -> main characteristics (white hair) -> body characteristics (twintails, green eyes, parted lip) -> clothing (wearing a white dress) -> clothing accessories (frills) -> other items (a cat) -> secondary environment (grass, sunshine) -> aesthetics (beautiful color, detailed, aesthetic) -> quality ((best quality:1.3))Negative prompts: (worst quality:1.3), low quality, lowres, messy, abstract, ugly, disfigured, bad anatomy, draft, deformed hands, fused fingers, signature, text, multi viewsSampler: Eular a normal as default, 28+ steps recommended.One additional merit of Neta Art XL is that it supports a very wide range of CFGs (5 - 20 compared to 7 - 9 of previous models). While we empirically found higher CFG leads to more details and higher contrast, generally CFG 9 - 14 (important!) can be used for best results.II. Highlight: Style VersatilityWe carefully selected 13 style keys with good orthogonality and are commonly used in many scenarios, justified by usage data from Nieta AI (30M+ generations).Having orthogonal styles means each style is effectively different from the others, allowing you to easily combine and create new styles without interference.Neta Art XL also includes a long list of artist styles, activated through by xxx clause.Please refer to https://civitai.com/models/124189/anime-illust-diffusion-xl for a complete list of supported artists.III. Expression, Posing, and Camera AnglesCompared to other models, Neta excels at maintaining stability, prompt following ability, and anatomical accuracy even with challenging poses or camera angles that would cause degradation in other models. We compared our results to the second-best candidate models to highlight Neta's advantages in these areas.IV. Multi-Character ScenesNeta Art XL demonstrated good stability for multi-character scenes.V. Text & TypographyNeta Art XL demonstrates good ability to keep poster-like text in good success rate.VI. TrainingData annotation combining multiple sources (Original prompt, CogVLM captions, WaifuTagger tags)Post-processing techniques like semantic deduplication and hierarchical tag organizationSemantic Deduplication: This removed redundant tags by intelligently detecting when a higher-level tag (e.g. very long hair) semantically covered a lower-level one (e.g. long hair).Tag Layering Algorithm: Tags were organized into hierarchical layers based on their priorities and related semantics (eg. by wlop influence the whole picture styling, while frills influence a small fraction). More dominant tags were placed in higher layers to prioritize their influence during training.Dataset management tool from https://github.com/Eugeoter/waifusetUsing high-quality regularization data from AIDXL: High-quality regular datasets with "best" and "amazing" quality ratings from AIDXL. These datasets are manually selected and come with detailed annotations and natural language descriptions.Finetuning on more knowledgeable base models like AAM, blending with AnimagineXL 3.1 Character Knowledge.Challenges Faced:Imbalance in learning different stylesPoor generalization for some styles to diverse scenesLack of details/texture in generationsTrigger word overlap with base model knowledgeSolutions Explored:Data reweighting to balance style learning, and supplement diverse data per style.Tuning sampling hyperparameters like minimum gamma and rectified flow. Rectified Flow is a training parameter that increases the sampling frequency in the middle time steps but weakens the weight of the model's learning ability for small noises in the low time steps. This technique helps to improve the model's ability to restore styles but requires the use of a knowledge-rich base model.Randomizing / drop off trigger words during training.VII. EvaluationNeta XL Art excels other models inSee https://nieta-art.feishu.cn/wiki/PpwqwVDzjiNE5kkUhRtcEsn6nmh for full evaluationVIII. LicenseDeveloped with ❤️ by: Neta.art Lab - https://civitai.com/user/nieta_artIn collaboration with:Euge: https://civitai.com/user/Euge_汤人烂: https://space.bilibili.com/8594480Chenkin: https://civitai.com/user/ChenkinBo Dai: https://daibo.info/Thanks to:https://blog.novelai.net/introducing-novelai-diffusion-anime-v3-6d00d1c118c3https://cagliostrolab.net/posts/animagine-xl-v3-releasehttps://civitai.com/models/269232/aam-xl-anime-mixhttps://civitai.com/models/124189/anime-illust-diffusion-xlhttps://github.com/deepghs/waifucModel type: Diffusion-based text-to-image generative modelLicense: We merged 0.05 CLIP and 0.15 UNet input layers from Animagine 3.1, thus Fair AI Public License 1.0-SDIX. Conclusion and Future WorkShortcomings:Some characters are underfitted.Styles are not activated well with long prompts.Certain styles appear grayish at low CFG and short prompts. Partly explained in https://civitai.com/articles/4969.Future Work:Prepare larger training sets and more knowledge-based data to improve character, style, and detail handling.Welcome others to join discussions, provide suggestions, and contribute to model advancement.Neta Art XL 2.0 is on the way. Stay tuned with us, and test our product for FREE: http://neta.art/Discord: https://discord.gg/AtRtbe9W8wTwitter: https://twitter.com/netaart_aiCivitai：https://civitai.com/user/nieta_art

Neta Art XL

Descrição

Comentários