- Amazon Bedrock Knowledge Bases now supports RAG evaluation (preview)
- Amazon Bedrock Model Evaluation now includes LLM-as-a-judge (preview)
- New capabilities streamline testing and improve generative AI applications
- Evaluations assess correctness, helpfulness, and responsible AI criteria
- Results provide natural language explanations and normalized scores
- RAG evaluations in Amazon Bedrock Knowledge Bases workflow:
- Create evaluation, choose Evaluator and Model
- Select knowledge base, configure retrieval and response generation
- Choose metrics (Helpfulness, Correctness, Harmfulness)
- Select dataset and S3 location for results
- Access evaluation results after completion
- Comparing RAG evaluations allows understanding improvements
- LLM-as-a-judge in Amazon Bedrock Model Evaluation workflow:
- Create evaluation, choose Evaluator and Generator model
- Select metrics (Helpfulness, Correctness, Harmfulness)
- Specify dataset location in Amazon S3
- Access evaluation results after completion
- New evaluation capabilities available in preview in specific AWS Regions
- Pricing based on standard Amazon Bedrock pricing for model inference
- Optimized for English language content at launch
自分の考え:
新しい評価機能は、テストを効率化し、生成AIアプリケーションを改善するのに役立つ。評価は正確性、助けになる度、責任あるAI基準などを評価し、自然言語の説明と正規化されたスコアを提供する。RAG評価やLLM-as-a-judgeを使用するAmazon Bedrock Knowledge BasesおよびModel Evaluationのワークフローは、構成が直感的でわかりやすく、結果の改善を理解するのに役立つ。価格設定は標準のAmazon Bedrock価格であり、英語コンテンツに最適化されているが、他言語のサポートも可能。