Get a weekly rundown of the latest AI models and research... subscribe! https://aimodels.substack.com/

Cjwbw

Models by this creator

AI model preview image

rembg

cjwbw

Total Score

5.3K

Remove images background

Read more

Updated 4/28/2024

AI model preview image

clip-vit-large-patch14

cjwbw

Total Score

4.5K

openai/clip-vit-large-patch14 with Transformers

Read more

Updated 4/28/2024

AI model preview image

anything-v3-better-vae

cjwbw

Total Score

3.4K

high-quality, highly detailed anime style stable-diffusion with better VAE

Read more

Updated 4/28/2024

AI model preview image

zoedepth

cjwbw

Total Score

3.4K

ZoeDepth: Combining relative and metric depth

Read more

Updated 4/28/2024

AI model preview image

anything-v4.0

cjwbw

Total Score

3.0K

high-quality, highly detailed anime-style Stable Diffusion models

Read more

Updated 4/28/2024

AI model preview image

real-esrgan

cjwbw

Total Score

1.4K

Real-ESRGAN: Real-World Blind Super-Resolution

Read more

Updated 4/28/2024

AI model preview image

dreamshaper

cjwbw

Total Score

1.2K

Dream Shaper stable diffusion

Read more

Updated 4/28/2024

AI model preview image

waifu-diffusion

cjwbw

Total Score

1.1K

Stable Diffusion on Danbooru images

Read more

Updated 4/28/2024

AI model preview image

cogvlm

cjwbw

Total Score

533.724

CogVLM is a powerful open-source visual language model developed by the maintainer cjwbw. It comprises a vision transformer encoder, an MLP adapter, a pretrained large language model (GPT), and a visual expert module. CogVLM-17B has 10 billion vision parameters and 7 billion language parameters, and it achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, and more. It can also engage in conversational interactions about images. Similar models include segmind-vega, an open-source distilled Stable Diffusion model with 100% speedup, animagine-xl-3.1, an anime-themed text-to-image Stable Diffusion model, cog-a1111-ui, a collection of anime Stable Diffusion models, and videocrafter, a text-to-video and image-to-video generation and editing model. Model inputs and outputs CogVLM is a powerful visual language model that can accept both text and image inputs. It can generate detailed image descriptions, answer various types of visual questions, and even engage in multi-turn conversations about images. Inputs Image**: The input image that CogVLM will process and generate a response for. Query**: The text prompt or question that CogVLM will use to generate a response related to the input image. Outputs Text response**: The generated text response from CogVLM based on the input image and query. Capabilities CogVLM is capable of accurately describing images in detail with very few hallucinations. It can understand and answer various types of visual questions, and it has a visual grounding version that can ground the generated text to specific regions of the input image. CogVLM sometimes captures more detailed content than GPT-4V(ision). What can I use it for? With its powerful visual and language understanding capabilities, CogVLM can be used for a variety of applications, such as image captioning, visual question answering, image-based dialogue systems, and more. Developers and researchers can leverage CogVLM to build advanced multimodal AI systems that can effectively process and understand both visual and textual information. Things to try One interesting aspect of CogVLM is its ability to engage in multi-turn conversations about images. You can try providing a series of related queries about a single image and observe how the model responds and maintains context throughout the conversation. Additionally, you can experiment with different prompting strategies to see how CogVLM performs on various visual understanding tasks, such as detailed image description, visual reasoning, and visual grounding.

Read more

Updated 4/28/2024

AI model preview image

rudalle-sr

cjwbw

Total Score

464.132

Real-ESRGAN super-resolution model from ruDALL-E

Read more

Updated 4/28/2024