NLAPI Training Data and Calls Documentation

Overview

The NLAPI (Natural Language API) provides developers with tools to evaluate and enhance the performance of their applications. By leveraging Training Data and Calls, developers can monitor how well the NLAPI performs across various tasks and scenarios specific to their use case. Additionally, Benchmarks are used to systematically assess and score the NLAPI's effectiveness, ensuring consistent and reliable performance.

Calls

You can view all your application calls to the NLAPI from the .../calls endpoint API Reference. A call is exactly the input you sent to NLAPI and the response the NLAPI sent back. It also contains more information like cost, timestamp, etc.

You can create benchmarks and training data from calls. Through the .../training-data endpoints. For more information read the Training Data Section

Training Data

Training Data serves as the foundation for evaluating and improving the NLAPI's performance. Developers can store application calls, analyze them, and mark specific entries as benchmarks to gauge the NLAPI's accuracy and efficiency. Training Data marked true for is_benchmark will be included in the benchmark for that application. The Benchmark will take in consideration the conversation history, user_input, context, and evaluated on the order and what endpoints are called.

Benchmarking

Benchmarking allows developers to assess the NLAPI's performance against predefined standards. By setting up benchmarks, developers can systematically evaluate how well the NLAPI handles specific tasks and scenarios within their application.

What is a Benchmark?

For the NLAPI a benchmark is a training-data object with the field is_benchmark set to true. (create one with the POST training-data route)API Reference A Benchmark is a test case that assesses the NLAPI's ability to handle specific inputs (user_input, conversation history, and context) and execute the correct sequence of endpoint calls. Benchmarks help in identifying the strengths and weaknesses of the NLAPI, providing insights into areas that may require optimization. (See Training NLAPI Section)

Why is Benchmarking important?

Benchmarking is crucial because it provides a systematic way to evaluate the performance and reliability of the NLAPI. By running benchmarks, developers can understand if the NLAPI is going to work for them and find areas where the NLAPI excels and areas that need improvement. This allows a developer to accurately evaulate which agent version is right for them (see Agent_List.md) and/or if they need to train a custom model. (See Training NLAPI)

This process ensures that the NLAPI meets the desired standards and performs optimally in real-world scenarios. Having a complete benchmark is important prior to production deployment so the developer can have insights to how the NLAPI will perform across a variety of different requests. Before changing agents, you should always have a benchmark done to make sure the switch is beneficial for your application. If any major changes or updates are made to an api schema, it's a good idea to perform a benchmark to make sure the NLAPI understands the changes to the schema by running another benchmark. (Make sure you have benchmarks that cover any of the new routes or if a benchmark contains old routes that they get updated.)

Benchmark Scoring

Each benchmark is scored based on the accuracy and order of endpoint calls:

1.0 Point: All endpoints are called correctly and in the exact expected order.
0.8 Points: All endpoints are correct, but the order is acceptable (e.g., [[call1], [call2], [call3]] vs. expected [[call1, call2], [call3]]).

The overall benchmark score is calculated as: [ \text{Score} = \frac{\text{Total Points Scored}}{\text{Total Benchmarks}} ]

Initiating a Benchmark

To run a benchmark, developers can use the /porta/jobs/benchmark endpoint. This process involves supplying the necessary agent and application identifiers to initiate the benchmark tests.

Training

Training LLMs can be a costly endeavor, but it is sometimes necessary to achieve the desired performance from an LLM or Agent. Here are some pre-training considerations and advice to help you make the most of your resources:

Pre-Training Advice

Run a Benchmark Prior to Training a New Agent:
- Before embarking on training a new Agent, it's crucial to run a benchmark. This helps you understand the current performance of the NLAPI and identify specific areas that require improvement. Benchmarks provide a baseline to measure the effectiveness of any training you undertake, ensuring that the training process is both targeted and efficient.
Optimize Descriptions of Endpoints and Database Comments:
- Review and enhance the descriptions of your endpoints in your OpenAPI specifications and database comments in your devii/graphql setup. Clear and detailed descriptions can sometimes improve the NLAPI's performance without the need for extensive training. This optimization can guide the NLAPI to better understand and handle requests, potentially saving time and resources.
Utilize Context to Guide the NLAPI:
- Providing additional context can help steer the NLAPI in the right direction. Contextual information can clarify user intents and improve the accuracy of the NLAPI's responses, reducing the need for further training.

Considerations for Training

Training an LLM is not only expensive but also requires ongoing updates as your application evolves. As your application changes, the model may need to be retrained to maintain its effectiveness. This continuous cycle of training and updating can be resource-intensive.

If the above strategies do not yield the desired results, you may need to train a model specifically tailored to your application. In such cases, please reach out to jase@jasekraft.com for further assistance and guidance on custom model training. Our training methods are only available upon request. A self serve model is coming soon.

Additional Resources

Agent ID List: Link to Agent ID List
API Reference Documentation: API Docs
Developer Support: Contact Support

For further assistance or inquiries, please reach out to the developer support team or consult the detailed API documentation.

PreviousSchemas

Last updated 5 months ago

Table of Contents