Featured AI
-
Netflix Eyeline-Research Go-with-the-Flow – An easy and efficient way to control the motion patterns of video diffusion models
https://github.com/Eyeline-Research/Go-with-the-Flow
https://huggingface.co/Eyeline-Research/Go-with-the-Flow/tree/main
-
DimensionX – Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
https://chenshuo20.github.io/DimensionX
https://github.com/wenqsun/DimensionX
https://huggingface.co/spaces/fffiloni/DimensionX
https://huggingface.co/wenqsun/DimensionX/tree/main
-
Tencent Hunyuan3D – an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets
https://github.com/tencent/Hunyuan3D-2
Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model – Hunyuan3D-DiT, and a large-scale texture synthesis model – Hunyuan3D-Paint.
The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio – a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets.
It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and e.t.c. -
Invoke.com – The Gen AI Platform for Pro Studios
Invoke is a powerful, secure, and easy-to-deploy generative AI platform for professional studios to create visual media. Train models on your intellectual property, control every aspect of the production process, and maintain complete ownership of your data, in perpetuity.
-
How does Stable Diffusion work?
https://stable-diffusion-art.com/how-stable-diffusion-work/
Stable Diffusion is a latent diffusion model that generates AI images from text. Instead of operating in the high-dimensional image space, it first compresses the image into the latent space.
Stable Diffusion belongs to a class of deep learning models called diffusion models. They are generative models, meaning they are designed to generate new data similar to what they have seen in training. In the case of Stable Diffusion, the data are images.
Why is it called the diffusion model? Because its math looks very much like diffusion in physics. Let’s go through the idea.
To reverse the diffusion, we need to know how much noise is added to an image. The answer is teaching a neural network model to predict the noise added. It is called the noise predictor in Stable Diffusion. It is a U-Net model.
After training, we have a noise predictor capable of estimating the noise added to an image.
Diffusion models like Google’s Imagen and Open AI’s DALL-E are in pixel space. They have used some tricks to make the model faster but still not enough.
Stable Diffusion is designed to solve the speed problem. Here’s how.
Stable Diffusion is a latent diffusion model. Instead of operating in the high-dimensional image space, it first compresses the image into the latent space. The latent space is 48 times smaller so it reaps the benefit of crunching a lot fewer numbers.
It is done using a technique called the variational autoencoder. Yes, that’s precisely what the VAE files are, but I will make it crystal clear later.
The Variational Autoencoder (VAE) neural network has two parts: (1) an encoder and (2) a decoder. The encoder compresses an image to a lower dimensional representation in the latent space. The decoder restores the image from the latent space.
You may wonder why the VAE can compress an image into a much smaller latent space without losing information. The reason is, unsurprisingly, natural images are not random. They have high regularity: A face follows a specific spatial relationship between the eyes, nose, cheek, and mouth. A dog has 4 legs and is a particular shape.
In other words, the high dimensionality of images is artifactual. Natural images can be readily compressed into the much smaller latent space without losing any information. This is called the manifold hypothesis in machine learning.
Where does the text prompt enter the picture?
This is where conditioning comes in. The purpose of conditioning is to steer the noise predictor so that the predicted noise will give us what we want after subtracting from the image.
The text prompt is not the only way a Stable Diffusion model can be conditioned. ControlNet conditions the noise predictor with detected outlines, human poses, etc, and achieves excellent controls over image generations.
This write-up won’t be complete without explaining Classifier-Free Guidance (CFG), a value AI artists tinker with every day. To understand what it is, we will need to first touch on its predecessor, classifier guidance…
The classifier guidance scale is a parameter for controlling how closely should the diffusion process follow the label.
Classifier-free guidance, in its authors’ terms, is a way to achieve “classifier guidance without a classifier”. They put the classifier part as conditioning of the noise predictor U-Net, achieving the so-called “classifier-free” (i.e., without a separate image classifier) guidance in image generation.
The SDXL model is the official upgrade to the v1 and v2 models. The model is released as open-source software. The total number of parameters of the SDXL model is 6.6 billion, compared with 0.98 billion for the v1.5 model.
The SDXL model is, in practice, two models. You run the base model, followed by the refiner model. The base model sets the global composition. The refiner model adds finer details.
More about Generative AI here -
IPAdapter – Text Compatible Image Prompt Adapter for Text-to-Image Image-to-Image Diffusion Models and ComfyUI implementation
github.com/tencent-ailab/IP-Adapter
The IPAdapter are very powerful models for image-to-image conditioning. The subject or even just the style of the reference image(s) can be easily transferred to a generation. Think of it as a 1-image lora. They are an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model.
Once the IP-Adapter is trained, it can be directly reusable on custom models fine-tuned from the same base model.The IP-Adapter is fully compatible with existing controllable tools, e.g., ControlNet and T2I-Adapter.
-
LatentSync – Audio Conditioned Latent Diffusion Models for Lip Sync + ComfyUI model
https://huggingface.co/spaces/fffiloni/LatentSync
https://github.com/bytedance/LatentSync
https://github.com/ShmuelRonen/ComfyUI-LatentSyncWrapper
https://www.gyan.dev/ffmpeg/builds
-
DiffSensei – Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Comic Book Generation
https://jianzongwu.github.io/projects/diffsensei
https://github.com/jianzongwu/DiffSensei
https://huggingface.co/jianzongwu/DiffSensei
-
László Gaál – Google Veo2 tests
Read more: László Gaál – Google Veo2 testshttps://www.linkedin.com/posts/laszloga_veo2-activity-7278344748464029696-z18_
https://www.linkedin.com/posts/laszloga_veo2-veo2-activity-7279424228779507712-zDgC
https://www.linkedin.com/posts/laszloga_veo2-activity-7280530104722583552-tGgJ
https://www.linkedin.com/posts/laszloga_veo2-activity-7280881794663510016-e8i8
https://www.linkedin.com/posts/laszloga_veo2-activity-7277947758932606976–7i9
https://www.linkedin.com/posts/laszloga_veo2-activity-7283050136446935041-EJGs
-
ComfyUI + InstaID SDXL – Face and body swap tutorials
Read more: ComfyUI + InstaID SDXL – Face and body swap tutorials
https://github.com/cubiq/ComfyUI_InstantID
https://github.com/cubiq/ComfyUI_InstantID/tree/main/examples
https://github.com/deepinsight/insightface
Unofficial version https://github.com/ZHO-ZHO-ZHO/ComfyUI-InstantID
Installation details under the post
(more…) -
ComfyUI Tutorial Series Ep 25 – LTX Video – Fast AI Video Generator Model
Read more: ComfyUI Tutorial Series Ep 25 – LTX Video – Fast AI Video Generator Modelhttps://comfyanonymous.github.io/ComfyUI_examples/ltxv
LTX-Video 2B v0.9.1 Checkpoint model
https://huggingface.co/Lightricks/LTX-Video/tree/main
More details under the post
(more…) -
The AI-Copyright Trap document by Carys Craig
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4905118
“There are many good reasons to be concerned about the rise of generative AI(…). Unfortunately, there are also many good reasons to be concerned about copyright’s growing prevalence in the policy discourse around AI’s regulation. Insisting that copyright protects an exclusive right to use materials for text and data mining practices (whether for informational analysis or machine learning to train generative AI models) is likely to do more harm than good. As many others have explained, imposing copyright constraints will certainly limit competition in the AI industry, creating cost-prohibitive barriers to quality data and ensuring that only the most powerful players have the means to build the best AI tools (provoking all of the usual monopoly concerns that accompany this kind of market reality but arguably on a greater scale than ever before). It will not, however, prevent the continued development and widespread use of generative AI.”
…
“(…) As Michal Shur-Ofry has explained, the technical traits of generative AI already mean that its outputs will tend towards the dominant, likely reflecting ‘a relatively narrow, mainstream view, prioritizing the popular and conventional over diverse contents and narratives.’ Perhaps, then, if the political goal is to push for equality, participation, and representation in the AI age, critics’ demands should focus not on exclusivity but inclusivity. If we want to encourage the development of ethical and responsible AI, maybe we should be asking what kind of material and training data must be included in the inputs and outputs of AI to advance that goal. Certainly, relying on copyright and the market to dictate what is in and what is out is unlikely to advance a public interest or equality-oriented agenda.”
…
“If copyright is not the solution, however, it might reasonably be asked: what is? The first step to answering that question—to producing a purposively sound prescription and evidence-based prognosis, is to correctly diagnose the problem. If, as I have argued, the problem is not that AI models are being trained on copyright works without their owners’ consent, then requiring copyright owners’ consent and/or compensation for the use of their work in AI-training datasets is not the appropriate solution. (…)If the only real copyright problem is that the outputs of generative AI may be substantially similar to specific human-authored and copyright-protected works, then copyright law as we know it already provides the solution.” -
xinsir – controlnet-union-sdxl-1.0 examples
https://huggingface.co/xinsir/controlnet-union-sdxl-1.0
deblur
inpainting
outpainting
upscale
openpose
depthmap
canny
lineart
anime lineart
mlsd
scribble
hed
softedge
ted
segmentation
normals
openpose + canny
-
What is deepfake GAN (Generative Adversarial Network) technology?
https://www.techtarget.com/whatis/definition/deepfake
Deepfake technology is a type of artificial intelligence used to create convincing fake images, videos and audio recordings. The term describes both the technology and the resulting bogus content and is a portmanteau of deep learning and fake.
Deepfakes often transform existing source content where one person is swapped for another. They also create entirely original content where someone is represented doing or saying something they didn’t do or say.
Deepfakes aren’t edited or photoshopped videos or images. In fact, they’re created using specialized algorithms that blend existing and new footage. For example, subtle facial features of people in images are analyzed through machine learning (ML) to manipulate them within the context of other videos.
Deepfakes uses two algorithms — a generator and a discriminator — to create and refine fake content. The generator builds a training data set based on the desired output, creating the initial fake digital content, while the discriminator analyzes how realistic or fake the initial version of the content is. This process is repeated, enabling the generator to improve at creating realistic content and the discriminator to become more skilled at spotting flaws for the generator to correct.
The combination of the generator and discriminator algorithms creates a generative adversarial network.
A GAN uses deep learning to recognize patterns in real images and then uses those patterns to create the fakes.
When creating a deepfake photograph, a GAN system views photographs of the target from an array of angles to capture all the details and perspectives.
When creating a deepfake video, the GAN views the video from various angles and analyzes behavior, movement and speech patterns.
This information is then run through the discriminator multiple times to fine-tune the realism of the final image or video.
COLLECTIONS
| Featured AI
| Design And Composition
| Explore posts
POPULAR SEARCHES
unreal | pipeline | virtual production | free | learn | photoshop | 360 | macro | google | nvidia | resolution | open source | hdri | real-time | photography basics | nuke
FEATURED POSTS
-
copypastecharacter.com – alphabets, special characters and symbols library
-
Photography basics: How Exposure Stops (Aperture, Shutter Speed, and ISO) Affect Your Photos – cheat cards
-
NVidia – High-Fidelity 3D Mesh Generation at Scale with Meshtron
-
Godot Cheat Sheets
-
Gamma correction
-
PixelSham – Introduction to Python 2022
-
Convert 2D Images to 3D Models
-
The Perils of Technical Debt – Understanding Its Impact on Security, Usability, and Stability
Social Links
DISCLAIMER – Links and images on this website may be protected by the respective owners’ copyright. All data submitted by users through this site shall be treated as freely available to share.