Highlights:

  • WhyLabs has introduced the LangKit toolkit, specifically tailored to cater to the monitoring requirements of large language models.
  • LangKit’s standout feature lies in its capacity to identify AI hallucinations, which refer to instances where a language model generates fabricated information in its responses.

WhyLabs Inc., a startup, unveiled LangKit, an open-source toolkit specifically developed to enable companies to monitor their large language models for potential safety issues and other risks.

WhyLabs, headquartered in Seattle, has secured USD 14 million in funding from prominent investors such as Bezos Expeditions, Madrona Venture Group, and AI Fund, a venture capital firm led by AI pioneer Andrew Ng. The company offers a platform that enables organizations to effectively monitor their AI models and training datasets for technical issues.

WhyLabs has introduced the LangKit toolkit, specifically tailored to cater to the monitoring requirements of large language models. According to the company, this software can effectively identify common issues in such models.

CEO and Co-founder Alessya Visnjic said, “We have been working with the industry’s most advanced AI/ML teams for the past year to build an approach for evaluating and monitoring generative models; these efforts culminated in the creation of LangKit.”

LangKit’s standout feature lies in its capacity to identify AI hallucinations, which refer to instances where a language model generates fabricated information in its responses. Additionally, the toolkit can detect toxic AI output and identify situations where a model unintentionally exposes sensitive business information from its training dataset.

LangKit offers a range of monitoring capabilities that specifically assist companies in tracking the usability of their models. The toolkit allows for the monitoring of AI responses’ relevance to users’ questions, as well as the evaluation of response readability.

According to WhyLabs, the LangKit toolkit enables companies to monitor model output and user input. It focuses explicitly on detecting malicious prompts that may be sent to a language model as part of a process known as AI jailbreaking. AI jailbreaking refers to bypassing the built-in guardrails of a neural network, tricking it into producing output that would typically be blocked.

The open-source nature of LangKit enables users with advanced needs to expand their capabilities by incorporating custom monitoring metrics. This level of flexibility empowers companies to effectively monitor additional aspects of their AI models that are not initially included in the toolkit’s default tracking features.

By configuring LangKit, users can set up alerts for specific technical issues that may arise. The software also provides visual representations in graphs for the collected error information. Administrators can refer to these graphs to assess whether there is a potential decrease in the accuracy of a language model over time, a phenomenon commonly referred to as AI drift.

LangKit also simplifies the task of testing AI code updates. With the toolkit, software teams can input a set of test prompts into a model before and after making a code change. By comparing the responses generated by the AI, developers can assess whether the update has improved or unintentionally decreased the quality of the responses.