With the rapid adoption of large language models (LLMs), it became clear early on that real-world enterprise applications require something more than general AI knowledge. To be genuinely useful, especially within companies, models need to work not just with their own training data, but with internal company data—data that lives behind firewalls,...
How Can Internet Communication Technologies Boost Business Profits?
What's the best architecture for seamless communication? How can companies implement these systems efficiently? These are just some of the questions I explore on my blog, where I dive into the latest strategies and innovations in digital communication. 🚀
If you've already set up a vector database—I've previously covered the basics on this blog—you know how powerful they can be for semantic searches, document retrieval, clustering related content, and much more. But trust me, the real excitement kicks in when you combine this database with a Large Language Model (LLM).
One particularly useful technique when working with vectorized documents is similarity detection—in simpler terms, identifying when an author may have copied or heavily borrowed from another source. Whether it's a literal copy/paste from another document or paraphrased content, semantic embeddings allow us to catch it.
Clustering – A Powerful Tool for Categorization
One of the most common uses of AI in companies is performing semantic search within their own documents. At this URL, I present a tool for basic conversion of a series of documents in various formats into a PostgreSQL vector database.
Advantages and Disadvantages of pg_vector Compared to Specialized Vector Databases
If you're involved in application development or data analytics, you've likely encountered the concept of "vectorization" or "embedding" text content. This process converts text into vector form, enabling computers to better understand the meaning behind words and sentences. It's essential for semantic search, recommendation systems, automatic...
How to Implement Semantic Search in Practice
When diving into the OpenAI API documentation, you'll notice there are two main ways to interact with the service. In previous examples, we focused on "assistants," which use API version 2—a feature still in beta. Alongside that, there's another option: the Agent SDK, a Python-based toolkit. Both approaches share some similarities, but depending on...
How to Work with Your Own Files in ChatGPT
When you want ChatGPT to include data from your own files in its responses (via semantic search), you have several options: