AI Enthusiast Weekly(2024-08-19) : I-SHEEP: Enhancing LLMs through Iterative Self-Alignment
I-SHEEP: Enhancing LLMs through Iterative Self-Alignment
I-SHEEP, an iterative self-enhancement paradigm, continuously pushes large language models (LLMs) to self-align, mimicking human learning. Unlike one-time alignment methods, I-SHEEP demonstrates significant improvements: 78.2% in Alpaca Eval, 24.0% in MT Bench, and 8.88% in IFEval accuracy. It also excels in standard benchmarks, enhancing code generation by 24.77%, TrivialQA by 12.04%, and SQuAD by 20.29%. The method's code and resources are publicly available.
ScoresAI News
ChemVLM: Bridging Vision and Chemistry with AI
ChemVLM bridges chemistry and AI. It's a large model, constructed on VIT-MLP-LLM, utilizing ChemLLM-20B and InternVIT-6B. It adeptly manages chemical images and text. They created a specialized dataset, tested it, and it performed excellently, leading benchmarks. Find it on Hugging Face.
Linly-Dubbing: Open-Source AI for Multilingual Video Translation
Linly-Dubbing: An open-source tool for AI-driven dubbing and video translation. It translates videos into multiple languages and generates subtitles.
Open-source: Software available for free, with its code accessible to modify or distribute.
AI-driven: Uses artificial intelligence to perform tasks traditionally done by humans.
xGen-MM (BLIP-3): Open Toolkit for Large Multimodal Models
xGen-MM, or BLIP-3, is a toolkit designed for constructing Large Multimodal Models (LMMs). It encompasses datasets, training methodologies, model architectures, and a variety of LMMs. These models have been evaluated across multiple tasks, demonstrating strong learning capabilities and competitive results. A specific model has been optimized for safety, minimizing problems such as hallucinations. All resources, including datasets and code, are available as open-source to foster LMM research.
LMMs: Models capable of processing both text and images. Hallucinations: Errors in which models produce incorrect or misleading information.
Google's AI-Powered Screenshots App Enhances Pixel 9 Management
Google's new Screenshots app, exclusive to Pixel 9 series devices, simplifies managing vast collections of screenshots. Using AI dubbed "Gemini Nano," the app creates titles and summaries for each screenshot, enhancing searchability.
At a demo, a screenshot of a hair dyeing guide was summarized with a headline. Users can search by typing or speaking queries, like asking for a Wi-Fi code, and the app quickly surfaces relevant screenshots.
Additional features include suggested actions, such as "Search in Maps" for screenshots with addresses, and the ability to set reminders for unread articles.
This tool, alongside other AI features like Made You Look and Gemini Live, marks a significant leap in practical AI applications, making daily tasks more efficient.
Deep-Live-Cam Software Enables Realistic Face Swapping
Deep-Live-Cam, a new software, has gone viral. It swaps faces from photos onto live webcam feeds, tracking pose, lighting, and expressions. Videos of Elon Musk and J.D. Vance imitations spread quickly.
The software, open source, tops GitHub's trending list. It combines several existing tools. A key component, "inswapper," uses a vast dataset of facial images. This AI model understands facial structures and dynamics, separating identity from pose and expression to enable realistic swaps.
This technology raises concerns about deception. Its rapid development underscores the ease of remote fraud. The implications are clear: technology's pace outstrips our ethical considerations.
The Premier League innovates offside determination using iPhone and machine learning technology.
The English Premier League is innovating its offside detection technology. In collaboration with Genius Sports, iPhones and machine learning are being used to assist referees. Offside decisions are often ambiguous, and although the VAR system has been in use for several years, it still causes delays and errors.
Genius's SAOT technology utilizes 24 to 28 iPhone 15 Pro devices to cover the entire field, generating 3D models of players. It captures up to 10,000 data points with rich details and operates at a frame rate of 200fps. The data is processed through GeniusIQ to identify body parts and predict positions.
This technology aims to precisely determine the positional relationships between players, the ball, and the goalkeeper. Offside calls are made at the moment the ball is kicked, with more frames captured to enhance accuracy. Compared to traditional VAR, SAOT offers more detailed insights, although its performance remains to be validated. The Premier League plans to fully adopt this technology by the end of the year and continue its use through the end of the season.
Tools
Polymarket Integrates Perplexity for Enhanced News Summaries
Polymarket, a betting platform, partners with Perplexity, an AI search engine. Users see news summaries on events. Perplexity plans more partnerships and uses AI for data visuals.
Polymarket CEO sees Perplexity as a source of trusted info. Perplexity's API is popular with developers and businesses. Critics question Perplexity's ethics, but it cites sources and shares ad revenue.
Perplexity raised $63 million, aiming for more.
API: Application Programming Interface. Allows different software to communicate.
Web scraping: Extracting data from websites.
AI Tool Critiques Social Media Personalities with Humor
AI tool critiques Twitter and Weibo accounts. Input IDs, get personality reports. Musk, Swift included.
Developed in China, mimicking foreign success. Low tech barrier, entertainment key. Many AI startups fail.
Insight: Novelty and humor drive these tools. They offer a break from routine, a new view of familiar figures. Innovation doesn't always need grandeur. Sometimes, simple fun ideas suffice.
Explanation:
- AI tool: Software that uses artificial intelligence to perform tasks.
- Twitter and Weibo: Social media platforms, popular for sharing short messages and engaging with public figures.
- Personality and activity reports: Analyses generated by the AI tool, providing insights into user behavior and character traits.
- Tech barrier: The difficulty level in creating a technology or product.
- Entertainment value: The appeal or amusement factor of a product.