Evaluating Machine Translation: Finding The Right Fit For Your Needs
Machine Translation (MT), which uses Artificial Intelligence (AI) to automatically translate content from one language to another, is now a vital tool for businesses and individuals looking for efficient language translation solutions.
As a Language Services Provider (LSP), we know the benefits of MT but also understand the concerns our clients have with wanting to ensure the tech works for them. Not just for their industry but for their specific business and their growth plans. We hear all the questions about output quality and the challenges that businesses face in getting the right fit.
In this article, we'll delve into the world of MT evaluation, discussing both human-based and automatic evaluation approaches. We’ll also talk real-world and show you how, at LinguaLinx, we use a comprehensive solution for optimizing MT outputs.
Where Are You On The Spectrum?
Simply put, "machine translation" is a broad term. It's like saying you need a car, but do you just need it for daily tasks like taking your kids to school, or are you aiming for high performance like a racing team? Both are cars, but they serve very different purposes.
With MT, are you a multi-national looking to convert tens of thousands of pieces of content into another language as you throw yourself into a new market, or are you a new product just thinking of dipping your toe into new territory to test the waters?
At one end of the spectrum, MT relies on a high level of post-editing, which involves extensive human review. At the other end of the spectrum, it can be totally autonomous, needing no human intervention, with various automatic evaluation methods to assess translation quality.
The MT Quality Challenge
MT has come a long way, thanks to advances in AI and neural networks. Yet, the quest for the perfect MT output remains elusive.
Why? Well, the real challenge lies in evaluating and ensuring the quality of machine-generated translations. It needs a human element. This means the process to achieve consistency and scalability can be a struggle.
Human-Based Evaluation Methods
The most common method of evaluating MT output is human-based assessment. This is as simple as linguists or bilingual experts reviewing and rating translations. This approach offers valuable insights into translation quality but is time-consuming, costly, and prone to subjectivity.
To combat the subjectivity, multiple linguists are needed. This adds more time and more costs.
There are two ways human intervention usually happens:
- Comparative Ranking – This is when human evaluators rank the translations in order of quality. However, here’s subjectivity creeping in again so this may not yield consistent results across different evaluators.
- Sentence-Level Scoring - This involves assigning numerical scores to individual sentences. While this method provides detailed feedback, it can be labor-intensive, especially for large amounts of text.
The reality is that both methods have a degree of subjectivity, but all translation does (after all, there’s a linguistic skill here), so it’s not necessarily a bad thing.
Automatic Evaluation Methods
Here, we have a chance to address subjectivity, reduce cost, and increase speed while minimizing the limitations of human-based evaluation. This is possible through the development of automatic evaluation metrics.
These metrics use algorithms to assess and score machine translation outputs, making the evaluation process more objective and efficient.
Some widely used automatic evaluation metrics include:
- BLEU (Bilingual Evaluation Understudy) - BLEU calculates the overlap of n-grams (a series of adjacent letters, punctuation marks, syllables, or occasionally words found in a dataset) between reference translations and machine-generated translations. It's a widely adopted metric but may not always reflect translation quality accurately.
- NIST (National Institute of Standards and Technology) - NIST measures the similarity between the machine translation output and reference translations using a weighted geometric mean of n-grams. Like BLEU, it has its limitations.
- METEOR (Metric for Evaluation of Translation with Explicit Ordering) - METEOR considers precision, recall, stemming, synonymy, and word order to evaluate translations. It offers a more comprehensive assessment but can still be imperfect.
- TER (Translation Error Rate) - TER focuses on the number of edits needed to transform the machine-generated translation into a reference translation. It's a valuable metric for identifying errors but may not capture fluency and coherence.
What does this all mean? Simply, there are various evaluation methods to think about. This is where your LSP's expertise helps guide you and clarify any technical terms.
LinguaLinx's Solution
This article wouldn’t be complete if we didn’t stick our flag in the sand and explain our approach. We accept that no single evaluation method is a one-size-fits-all solution for every use case, so what we do is offer a comprehensive approach to optimizing MT outputs.
Here's how we do it:
- Diverse Machine Engines - We provide access to one of the most diverse sets of machine engines in the market. This means we can select the best engine for your specific language pair (the source and target languages) and use case (what’s being translated?) based on existing research, maximizing the chances of high-quality translations from the start.
- Source Quality Improvement - Before translation even begins, we focus on improving the quality of the source content. Clean, well-structured source text leads to better MT results. Generative AI is a great tool for cleaning up and structuring source content in prep for translation.
- Customized Evaluation - Once your MT output is optimized, we work with you to explore various evaluation methods. By tailoring our approach to your needs, we save you time and money while ensuring that your translations meet your standards and those of your end users.
Why It Makes Sense
Our approach isn’t rocket science, but at the same time, a lot of the industry is too invested in one style of MT and one way to evaluate it, regardless of the needs of their clients. We don’t because we prefer to position ourselves as a boutique arm of our clients’ businesses.
Evaluating MT is a critical step in ensuring the quality of translated content. And there’s no point in translating content unless it’s a quality level suitable for its end purpose.
This doesn’t mean reducing quality, it just means making sure you’re getting value for your money. Is it text for a public-facing website that’s being translated or metadata that no one will ever probably see? Two web applications with two totally different ways of defining quality.
We take an approach that combines access to diverse machine engines, source content quality improvement, and customized evaluation methods. We want our clients to be confident that their MT meets their standards and engages with their audience, while saving them time and resources in the process.
Get A Quote For Your Translation Needs
If you need a partner to help you understand how MT can help your business, we’d love to sit down and talk with you about it.
Consultations are free and there’s no obligation.
With LinguaLinx, you won't ever have to worry about your message getting lost as it’s translated. You know you're in good hands with our ISO 17100, ISO 9001 compliance, twenty years of professional translation experience, and the organizations whose trust we've earned.