Chartered Institute
of Linguists

Translation technology: easy definitions


Sometimes overwhelmed by tech? Or jargon and acronyms? You’re not alone. 


Everyone struggles to keep up these days. There are so many new tools and trends. And with all the hype around AI, it’s even harder to understand and test everything. 

That’s why we’ve developed this glossary. It contains some clear definitions related to technology in translation. This can help you talk to clients with more confidence. It may also give you ideas about areas where you might want to learn more.  

We will update this page as technology develops. Bookmark it so you can come back. 

Tip: if you’re looking for a specific term, use your browser’s search function (Ctrl + F / Cmd + F) to see the search box. 


##A–C


Add-on: any software extension that adds functionality to an existing program, e.g. plugin, browser extension, macro. 

AI translation: translation produced using artificial intelligence, usually an artificial neural network, for more fluent, context-aware translations. 

Aligner: software that aligns source and target texts at sentence level to create a bilingual corpus or translation memory. Often exports to TMX format for use in a CAT tool. 

Application programming interface key (API key): digital credential used to connect software such as a CAT tool or TMS to a third-party service such as a machine translation engine. 

Artificial intelligence (AI): technology that enables machines to perform tasks that normally require human intelligence, e.g. learning, problem-solving and understanding language. 

Artificial neural network (ANN): computing system inspired by the brain, made of layers of connected nodes that process data and learn patterns to solve tasks like translation or image recognition. 

Augmented translation: a translator-controlled process where integrated AI tools provide context, suggestions and guidance. This can enhance the translator’s consistency, responsiveness and productivity. 

Automatic terminology extraction (ATE): process of automatically scanning a corpus to identify terms for glossaries or termbases. Reduces manual effort in terminology management. See also: terminology extraction tool (TET). 

Back translation: a method of producing synthetic data by translating target texts back into the source language, often used to train or evaluate machine translation systems. 

Bilingual search engine: a search engine that allows queries and returns results in two languages. This helps users find translations and information across different languages. 

CAT discount: reduced translation rate applied when using a CAT tool, typically for text segments that match existing translations saved in a translation memory. See also: translation match. 

CAT tool: see computer-assisted translation tool. 

Cloud service: online tool or resource, e.g. software or storage, that you can use over the internet without installing it on your own computer. 

Computational linguistics: the study of using computers to process, analyse and model human language. The discipline combines linguistics, AI and computer science. 

Computer-assisted translation tool (CAT tool): software that helps translators work more efficiently by managing translation memory, terminology and consistency across documents. 

Concordance: a function in a CAT tool or corpus software that shows a word or phrase in its surrounding context across multiple texts. Helps ensure consistent terminology use. 

Content management system (CMS): platform for publishing and managing digital content. Frequently integrated with a localisation tool to streamline the translation workflow. 

Corpus: a large, structured collection of texts used for linguistic analysis, research or training language and translation models. A corpus can be monolingual or multilingual. The plural form is corpora. Publicly available corpora include OPUS, COCA, Hansard and the UN parallel corpus. 

Corpus analysis tool: software for studying frequency, collocation and word usage across large text collections. Supports research, terminology and stylistic decisions. 

Crowdsourcing translation platform: system that relies on volunteer or community-based contributions for translation tasks. Useful for large-scale, low-budget projects but requires quality control. 


##D–I


Desktop automation tool: software that automates repetitive tasks, keyboard shortcuts and text expansion on a computer. 

Editing and/or proofreading tool: software that helps check and improve text for grammar, spelling, style and clarity. 

Encryption: security method for protecting data stored or transmitted in a CAT tool, translation management system or cloud service. Important for client confidentiality. 

File conversion tool: software that changes a file from one format to another so it can be opened or used in different programs, e.g. converting a PDF to Word or a video to text. 

Generative AI (GenAI): a type of artificial intelligence that creates new content, e.g. text, images or music, based on patterns learned from existing data. 

Generative pre-trained transformer (GPT): an AI language model trained on large text datasets to generate, understand and process human-like language for tasks like writing, summarising and translating. 

Hotstring: a shortcut that automatically expands a short sequence of characters into a longer text or performs a command, often used to speed up typing or repetitive tasks. 

Intelligent character recognition (ICR): technology that recognises and converts handwritten or printed characters into digital text. It has better accuracy than standard OCR for handwriting. 

Internet search operator: special symbol or word used in a search engine to refine queries and get better results, e.g. site:  

Keyword tool: software that helps identify, research and analyse keywords for search engine optimisation (SEO) or content targeting to reach specific audiences. 


##L–N 


Large language model (LLM): an AI system trained on massive amounts of text to understand and generate human-like language for tasks like writing, summarising or translating. 

Localisation automation / text-string technology: processes and tools that extract translatable text from code and automate its reintegration. This reduces the risk of breaking software. 

Locked segment: in a translation project in a CAT tool, a portion of text that cannot be edited, usually to preserve approved terminology, formatting or verified translations. 

Machine learning / deep learning: the artificial intelligence methods that underly modern natural language processing and neural machine translation. 

Machine translation (MT): any automated translation system, including rule-based, statistical and neural methods. The main types are generic MT (prebuilt systems trained on broad data), customisable MT (engines trained with domain-specific or client-owned data) and adaptive MT (engines that update in real time based on translator edits). 

Machine translation engine: the software system that generates translations automatically. 

Machine translation post-editing (MTPE): reviewing and correcting text produced by machine translation to ensure it is accurate, grammatical and suitable for its intended purpose. Performed by a post-editor. 

Machine translation pre-editing: preparing source text before machine translation by simplifying sentences, standardising terminology, correcting errors and removing ambiguities. The aim is to improve translation output. Performed by a pre-editor. 

Macro: a set of recorded commands or instructions that automates repetitive tasks in software, executed with a single action or keystroke. 

Metric (in machine translation): a quantitative measure used to evaluate the quality of machine-translated text, often by comparing it to reference translations. Examples include bilingual evaluation understudy (BLEU), metric for evaluation of translation with explicit ordering (METEOR) and National Institute of Standards and Technology (NIST) score. 

Natural language processing (NLP): a computational linguistics technique, e.g. parsing or tokenisation, used in text mining, machine translation and terminology extraction. 

Neural machine translation (NMT): a type of machine translation that uses artificial neural networks to translate text. It produces more natural translations than older methods. 


##O–S


Optical character recognition (OCR): converts scanned images or PDFs into machine-readable text for easy processing and translation. 

Parsing: analysing the grammatical structure of text to identify relationships between words. See also: syntactic analysis, tokenisation. 

Plugin: a software add-on that extends the functionality of a main program. This enables the program to offer new features or tools. 

Post-editing: the process of reviewing and correcting machine-translated text to improve accuracy, fluency and readability before final use. See also: machine translation post-editing (MTPE). 

Productivity tool: software designed to help the user complete tasks more efficiently, manage workflows and improve output quality. 

Scraper: software that automatically extracts data from websites for analysis, storage or reuse. 

SEO translation: adapting content that will be used online for another language while optimising keywords to improve visibility on search engines and in AI search in the target market. 

Sign translation system: a system that translates text from images by detecting signs, posters, menus, etc. It extracts the text using OCR and automatically translates it into a target language. 

Speech-to-text / dictation software: converts spoken language into written text. Can increase productivity and reduce wrist strain. 

String: a sequence of characters, e.g. letters, numbers or symbols, treated as a single unit in computing or text processing. 

String extraction: the process of identifying and retrieving a specific sequence of text (string) from a larger dataset or document. 

Syntactic analysis: examining sentence structure in text to understand grammar and word relationships. See also: parsing, tokenisation. 


##T–Z 


Termbase / terminology database / terminology repository: a database that stores approved terms and their definitions, translations and usage notes to ensure consistency across multilingual content. 

Terminology extraction tool (TET): software that automatically identifies and extracts specialised terms from text to build termbases or support translation and linguistic analysis. See also: automatic terminology extraction (ATE). 

Text aligner: software used to create bilingual resources by aligning source and target segments. See also: aligner. 

Text mining: use of software to extract unstructured text patterns, terminology or sentiment from large text datasets. Supports machine translation training and terminology management. Uses include searching for collocations, identifying word frequency and disambiguation. 

Text-to-speech (TTS): converts written text into synthetic spoken output. Useful for proofreading and accessibility. 

Tokenisation: during parsing, text data is split into smaller units, such as words or phrases, called tokens and each token is given a tag. See also: parsing, syntactic analysis. 

Translation management system (TMS): platform for managing a translation workflow, including clients, projects, outsourcing and invoicing, among other business activities. 

Translation match: in a CAT tool translation project, a segment that partially (fuzzy match) or fully (100% match, context match) corresponds to previously translated text in a translation memory. Matches help maintain consistency and speed up translation. 

Translation memory (TM): a database of previously translated source–target pairs. Helps translators work faster and ensure better consistency in CAT tools. 

Translation memory exchange (TMX): a standard file format for sharing translation memories between different CAT tools. 

Translation package: a bundle of files, resources and metadata prepared for translation. This includes the source text, reference materials and instructions for translators and/or CAT tools. 

Translation workflow: structured sequence of steps that comprise the process of translating content, from source text preparation through translation, review, editing and final delivery. 

Voice dictation technology: general category for systems that allow spoken translation input instead of typing. 

Web spider: software that automatically browses the internet to collect data from websites. In translation, it’s often used to collect multilingual texts or parallel content for building a corpus or translation memory.