Jul 24, 2025
Grant Forbes
“Search” is a complex and occasionally nebulous concept, one that can be decomposed into a number of different approaches. Each of these approaches has their own advantages and disadvantages, and Inquisite combines all of them to best provide relevant, comprehensive results to researchers. In this article, I’ll give a high-level breakdown of the main forms that search can take, and the pros and cons of each component.
Keyword Search
Back in the day, keyword search was just called “search.” It’s essentially what it says on the tin: it scans some corpus of documents for some word or set of words. It can be somewhat more complex than that, with the use of regular expressions and similar complex searching schemes, but fundamentally this is the simplest type of search, and the complexity introduced by regular expressions is fundamentally limited by the creativity and search-related expertise of the person using the search. This isn’t to say that these tools necessarily require expertise in regex from the end user: they can be (and usually are) mediated by a more user-friendly frontend (dropdown menus with categories for “and” and “not,” for example, or filters by document metadata such as date). However, this complexity still risks placing a cognitive burden on the end user to use these tools optimally.

Figure 1: File browser search bars generally use keyword search. Given this, most users will likely have experienced some of the pitfalls of keyword search firsthand.
Keyword search is best, then, when the task the user has in mind can be decomposed easily into a number of keywords that the user knows, and which by themselves exhaustively cover the space of interesting items, while including little else. For example, any highly standardized scientific domain with specific, agreed-upon terms with exact meanings is one in which keyword search will likely excel. In this domain, keyword search will always be the fastest, simplest, and most computationally efficient search method. However, there are many situations in which one or more of these assumptions are violated: any domain with multiple definitions of words, for example, or with many synonyms or imprecise language. In these scenarios, more complex forms of search are often more successful.
Semantic Search
Semantic search fundamentally leverages the notion of “vector embeddings,” which are lists of numbers that represent a word or part of a word (also called “tokens”) in some abstract high-dimensional representation space. The particular list of numbers that corresponds to the vector embedding of a given word is decided by an algorithm that aims to situate the vector embeddings of words (or parts of words) that occur in a similar context to each other as near to each other as possible. The end result of this is that the lists of numbers are not arbitrary, but in fact represent the word’s location in some space of meanings. The most famous example can be found in the abstract of the "word2vec" paper which introduced the concept: in their vector embedding space, “queen” is approximately equal to “king” + “woman” - “man.” So semantic relationships can be represented in a vector space in a way that is mathematically meaningful. There’s an approximate vector for the concept of “gender,” and they find that there are similarly constant vector representations for other concepts that can differentiate words, like “plurality,” “past-tenseness,” and more.
The best general-audience explanation of vector embeddings I’ve ever seen is in the “Word embeddings” section of this video, which also explains some of the high-scale architecture behind Large Language Models (LLMs). If you want to get a more intuitive sense of what vector embeddings are really like, I’d highly recommend the game Semantle, which is like Wordle in that there’s a new game you have to guess every day, but instead of getting feedback on the overlapping letters between words, you’re getting feedback on how similar the vector embeddings of the secret word and your guess are: in other words, how similar they are conceptually. So “chair” and “stool,” for example, will have relatively similar vectors, despite sharing no letters in common. This embedding really helps LLMs to learn better to generate semantically meaningful text, but it does more than that: we can actually leverage these vector embeddings as a way to search in a space of concepts, rather than just words.

Figure 2: Pictured are my guesses and their relation to the correct score for the Semantle on the day I was writing this article. Note that more similar words to the true answer have higher scores (on this particular day, most of the most semantically similar words also sound similar, but this isn’t always the case). Note also the lightbulb emojis: this means that I used two hints, because Semantle is a hard game.
As it turns out, if you take every word in a document, and combine their vector embeddings together, you get a vector that meaningfully captures the overall meaning of that document in the embedding space. Similarly, if you take a natural-language query, and combine its vector embeddings, you get a meaningful representation of that query’s meaning. Given a corpus of documents then, one can narrow that corpus down to a smaller list of documents likely to be relevant to a given query by checking which documents’ vector embeddings are most similar to the vector embedding of the query. Typically this process uses cosine similarity, a metric that quantifies how near to the same direction two vectors are pointing.
This form of search can easily find semantically relevant passages from a large corpus of documents, including ones which a keyword search may have missed. However, it can also require substantial preprocessing and tokenization of the corpus of text, and can struggle with any semantic context that is not fully captured by the embedding method being used or by the vector combinations of embeddings. As a somewhat contrived example, say that a user wants to search for documents that begin by talking about Topic A, and then switch to talking about Topic B. As the sum of vector embeddings is agnostic to the order the words appear in, it fundamentally isn’t able to distinguish between such a document and one that begins with Topic B then ends with Topic A. While this particular example is fairly narrow, it serves the intuition that there are important aspects of meaning that can’t necessarily be captured through vector embedding sums, and thus that there are occasions wherein even more complicated forms of search are required.
Generative Search
The third and final form of searching we use at Inquisite is what we’ll call “generative search.” It consists of prompting a LLM to evaluate the relevance of some particular document to a user's query, either using its own judgment or a pre-defined set of criteria to evaluate relevance. This will often provide the most granular, detailed, and accurate ranking for a search of the three methods in this article, as it is able to evaluate the documents holistically. This is not to say that it’s perfect: LLMs are known to be susceptible to “hallucinations,” which are factual inaccuracies in their generated output. But these can be minimized in several ways, and particularly in the context of classifying articles (into “relevant,” “not relevant” and “slightly relevant” categories, for instance), the only possible hallucinations manifest as noise that is likely to be less than the noise brought by utilizing the other search methods.

Figure 3: ChatGPT’s response to the prompt ‘Generate an image for the "generative search" section of an article I'm writing about the different forms of search, particularly as they relate to AI and LLM use. Take a deep breath before generating the image, to make sure it's a really good one.’ Note the insistence that the LLM ‘take a deep breath,’ a prompt engineering technique that was demonstrated to improve performance on some tasks, and which itself was first generated by an LLM in this Deepmind paper. Note also that ChatGPT decided that the GPT logo is the best universal symbol for LLMs.
Generative search, while particularly granular and powerful, is also limited in its scalability in two important ways. Firstly, it is computationally expensive. This means that, for a large corpus of documents, it is infeasible to simply run each of them through an LLM due to the time and cost required. Secondly, LLMs can only hold so much “context” in their memory at once: this is the length of documents that they can read at once without beginning to forget the first portion. If any documents in the corpus are longer than the context of the LLM being used for generative search, then any connections between the first section of the document and the last will be lost: if a term is defined on the first page but then used in the last, for example, the LLM may struggle to remember the definition while reading the usage of the term on the last page, which substantially increases the risk of hallucinations and inaccurate measurements. This can be mitigated in a variety of ways, however, for example by splitting longer documents into smaller sections and feeding them individually or by truncating sections estimated to be less relevant. Overall, when the volume of text to be searched is relatively small (or a larger volume can be narrowed down to a relatively small relevant subsection via other means), generative search is likely to be the most effective method available assuming it is feasible within compute constraints.
Putting It All Together
All three search methods we’ve discussed here have their strengths and weaknesses, as compiled in the table below.
Keyword Search | Semantic Search | Generative Search | |
Pros |
|
|
|
Cons |
|
|
|
At Inquisite, we have built a multi-step search pipeline that combines these methods to get the best of all worlds. At a high level, our pipeline starts with executing multiple keyword and semantic searches on a large corpus of texts (research papers, FDA guidances etc), and then uses LLM-based methods to narrow down to a smaller set of documents on which we can more efficiently and effectively apply generative search methods to precisely identify the most useful text in the corpus given the user's query. We also employ LLMs to review the contents of the final set of identified text sources and determine if we have collected enough information to sufficiently address the query. If not, the process restarts with a different set of initial queries. This approach allows us to leverage the computational efficiency of simpler search methods, while at the same time gaining the depth and nuance of more complex search methods on the documents for which it’s truly beneficial.
This article has only really scratched the surface of search methods: there are many search methods not discussed here, and nuances/complexities within those that are discussed. Even within Inquisite’s pipeline itself, we leverage additional search methodologies: our clinical trial search uses a proprietary knowledge graph, for instance, and searches based on connectedness within that graph structure. But that’s an article for another time. Hopefully here I’ve given you a better understanding of the basic types of search used in Inquisite’s main pipeline, the benefits/risks of each, and how we leverage them all to get the best result.