NLAPI Training Data and Calls Documentation
Last updated
Last updated
The NLAPI (Natural Language API) provides developers with tools to evaluate and enhance the performance of their applications. By leveraging Training Data and Calls, developers can monitor how well the NLAPI performs across various tasks and scenarios specific to their use case. Additionally, Benchmarks are used to systematically assess and score the NLAPI's effectiveness, ensuring consistent and reliable performance.
Training Data serves as the foundation for evaluating and improving the NLAPI's performance. Developers can store application calls, analyze them, and mark specific entries as benchmarks to gauge the NLAPI's accuracy and efficiency. Training Data marked true for is_benchmark
will be included in the benchmark for that application. The Benchmark will take in consideration the conversation history, user_input, context, and evaluated on the order and what endpoints are called.
Benchmarking allows developers to assess the NLAPI's performance against predefined standards. By setting up benchmarks, developers can systematically evaluate how well the NLAPI handles specific tasks and scenarios within their application.
Benchmarking is crucial because it provides a systematic way to evaluate the performance and reliability of the NLAPI. By running benchmarks, developers can understand if the NLAPI is going to work for them and find areas where the NLAPI excels and areas that need improvement. This allows a developer to accurately evaulate which agent version is right for them (see Agent_List.md) and/or if they need to train a custom model. (See Training NLAPI)
This process ensures that the NLAPI meets the desired standards and performs optimally in real-world scenarios. Having a complete benchmark is important prior to production deployment so the developer can have insights to how the NLAPI will perform across a variety of different requests. Before changing agents, you should always have a benchmark done to make sure the switch is beneficial for your application. If any major changes or updates are made to an api schema, it's a good idea to perform a benchmark to make sure the NLAPI understands the changes to the schema by running another benchmark. (Make sure you have benchmarks that cover any of the new routes or if a benchmark contains old routes that they get updated.)
Each benchmark is scored based on the accuracy and order of endpoint calls:
1.0 Point: All endpoints are called correctly and in the exact expected order.
0.8 Points: All endpoints are correct, but the order is acceptable (e.g., [[call1], [call2], [call3]] vs. expected [[call1, call2], [call3]]).
The overall benchmark score is calculated as: [ \text{Score} = \frac{\text{Total Points Scored}}{\text{Total Benchmarks}} ]
To run a benchmark, developers can use the /porta/jobs/benchmark
endpoint. This process involves supplying the necessary agent and application identifiers to initiate the benchmark tests.
Training LLMs can be a costly endeavor, but it is sometimes necessary to achieve the desired performance from an LLM or Agent. Here are some pre-training considerations and advice to help you make the most of your resources:
Run a Benchmark Prior to Training a New Agent:
Before embarking on training a new Agent, it's crucial to run a benchmark. This helps you understand the current performance of the NLAPI and identify specific areas that require improvement. Benchmarks provide a baseline to measure the effectiveness of any training you undertake, ensuring that the training process is both targeted and efficient.
Optimize Descriptions of Endpoints and Database Comments:
Review and enhance the descriptions of your endpoints in your OpenAPI specifications and database comments in your devii/graphql setup. Clear and detailed descriptions can sometimes improve the NLAPI's performance without the need for extensive training. This optimization can guide the NLAPI to better understand and handle requests, potentially saving time and resources.
Utilize Context to Guide the NLAPI:
Providing additional context can help steer the NLAPI in the right direction. Contextual information can clarify user intents and improve the accuracy of the NLAPI's responses, reducing the need for further training.
Training an LLM is not only expensive but also requires ongoing updates as your application evolves. As your application changes, the model may need to be retrained to maintain its effectiveness. This continuous cycle of training and updating can be resource-intensive.
For further assistance or inquiries, please reach out to the developer support team or consult the detailed API documentation.
You can view all your application calls to the NLAPI from the .../calls
endpoint . A call is exactly the input you sent to NLAPI and the response the NLAPI sent back. It also contains more information like cost, timestamp, etc.
You can create benchmarks and training data from calls. Through the .../training-data
endpoints. For more information read the
For the NLAPI a benchmark is a training-data object with the field is_benchmark
set to true. (create one with the POST training-data route) A Benchmark is a test case that assesses the NLAPI's ability to handle specific inputs (user_input, conversation history, and context) and execute the correct sequence of endpoint calls. Benchmarks help in identifying the strengths and weaknesses of the NLAPI, providing insights into areas that may require optimization. (See Training NLAPI Section)
Agent ID List:
API Reference Documentation:
Developer Support: