ToolEmu: Identifying the risk of LLM Agents
#13: LLM Observability in a notebook, Dawn of LLMs, and more..
ToolEmu: Identifying the risk of LLM Agents
A framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation for risk assessment in a virtual sandbox.
It also has a safety-evaluator that examines agent failures and quantifies associated risks.
Demo can be accessed here and paper here.
Phoenix: LLM Observability in a Notebook
Phoenix by ArizeAI provides a notebook first experience for LLM applications including LLM Traces, LLM Evals, Embedding Analysis, RAG Analysis, and Structured data analysis.
It provides support for popular libararies like LlamaIndex, LangChain etc.
Explore it here
Paper: Preliminary Explorations with GPT-4V(ision)
This paper analyses the latest model GPT-4V(ision), the tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model.
A detailed report spanning 160+ pages with a collection of carefully designed qualitative samples spanning a variety of domains and tasks.
Read the paper here.
And More..
HuggingFace is working on a community led Computer Vision course. If interested in contributing, check this tweet.
Pytorch announces docathon - a community driven documentation quality competition