Reinforcement Learning From Human Feedback Took Travel AI Tool To Near-Perfect Accuracy

In late 2022, just before the launch kicked off the current AI frenzy, our development team had the opportunity to experiment with the model. We were a travel publisher with no plans on becoming a tech company. But it was obvious to us that this technology could be used to plan and book trips in a much more efficient, enjoyable way.

Within months we launched an AI tool that travelers could message via . Its accuracy was about 85%. That might not sound terrible, but when roughly one of every six conversations includes miscommunication or hallucinations, what you have is a fun gadget, not game-changing tech.

Thanks to our travel media platform, we were able to attract a critical mass of users, which allowed us to improve performance through reinforcement learning from human feedback. Over the next 15 months, we were able to increase accuracy to 98%, which has enabled us to strike partnerships with major travel brands, win awards and draw in more than a million users. Here鈥檚 how we did it.

A helping human hand

It鈥檚 helpful when users tell AI when an answer is wrong, which is the simplest form of reinforcement learning. If someone asks for restaurant recommendations in the Pearl District in Portland and the AI includes a recommendation in the Hawthorne District, the user may point out the inaccuracy. But relying on direct user feedback isn鈥檛 enough.

We hired five people, most of whom speak multiple languages, to put reinforcement learning into high gear. To date they鈥檝e monitored 1.5 million conversations between users and the AI. These agents catch subtle miscommunications. If a user asks for recommendations of the best kid-friendly resorts in Mexico, the AI might ask to specify the city thinking the user would like hotel rates. But they don鈥檛 know yet 鈥� they鈥檙e just looking for general information.

At this point the agent is able to intervene, manually taking over the conversation and getting it back on track. Then the agent flags and categorizes the issue for a backend fix, which improves the system for an entire category of questions.

Reframing the question

Sometimes inaccuracies are a result of the way the question is asked. To improve outcomes, we needed to improve the quality of the questions. We developed a system that categorizes and reframes questions before they are fed into the large language model. This process assures that we get the most from our extensive site indexing.

Questions about live events initially posed a challenge. A query like, 鈥淲hat are some events going on in Estes Park, Colorado, this weekend?鈥� might find a page about events from two years ago that includes the phrase 鈥渢his weekend,鈥� causing a hallucination. But what is the user really asking? The timing of the question needs to be translated into a specific date, where 鈥渢his weekend鈥� becomes 鈥淛an. 25-26, 2025.鈥�

Another challenge is combining questions across multiple messages. Someone might ask for recommendations in Vancouver, then follow up with 鈥渃lose to Yaletown.鈥� The underlying question needs to roll in new elements as they are added 鈥� 鈥淩ecommend Airbnbs in the Yaletown area of Vancouver.鈥�

Ping the partner

Site indexing is essential. For in-depth knowledge and real-time information, you need partners and data sources you can ping behind the scenes. Once we improved the ability to accurately identify the intent of the user, we needed a network of plugins to get the data they were seeking for flight times, hotel pricing and exchange rates.

When a user asks a question, our AI categorizes it as a particular intent, sources the appropriate data, and feeds the result into the LLM to deliver the information in coherent, consistent and conversational language. There鈥檚 a lot more going on behind the scenes than the baseline ChatGPT, but the user experience is the same and responses are noticeably richer and more accurate.

Creating a plugin for every type of intent is intensive. As you work through it, it鈥檚 important to communicate to the user in a friendly way what your AI can鈥檛 yet do. A response from the AI saying, 鈥淚 don鈥檛 yet have that capability,鈥� provides a better user experience than a hallucination 鈥斅燼nd it鈥檚 a great way of maintaining accuracy while building out your product.

is the chief technology officer at , a leading travel publisher and creator of the award-winning AI travel genius .

Illustration:

TagsAI

Stay up to date with recent funding rounds, acquisitions, and more with the 附近上门 Daily.

附近上门