• Amazon Bedrock Knowledge Bases now supports RAG evaluation (preview)
  • Amazon Bedrock Model Evaluation now includes LLM-as-a-judge (preview)
  • New capabilities streamline testing and improve generative AI applications
  • Evaluations assess correctness, helpfulness, and responsible AI criteria
  • Results provide natural language explanations and normalized scores
  • RAG evaluations in Amazon Bedrock Knowledge Bases workflow:
    • Create evaluation, choose Evaluator and Model
    • Select knowledge base, configure retrieval and response generation
    • Choose metrics (Helpfulness, Correctness, Harmfulness)
    • Select dataset and S3 location for results
    • Access evaluation results after completion
  • Comparing RAG evaluations allows understanding improvements
  • LLM-as-a-judge in Amazon Bedrock Model Evaluation workflow:
    • Create evaluation, choose Evaluator and Generator model
    • Select metrics (Helpfulness, Correctness, Harmfulness)
    • Specify dataset location in Amazon S3
    • Access evaluation results after completion
  • New evaluation capabilities available in preview in specific AWS Regions
  • Pricing based on standard Amazon Bedrock pricing for model inference
  • Optimized for English language content at launch

自分の考え:
新しい評価機能は、テストを効率化し、生成AIアプリケーションを改善するのに役立つ。評価は正確性、助けになる度、責任あるAI基準などを評価し、自然言語の説明と正規化されたスコアを提供する。RAG評価やLLM-as-a-judgeを使用するAmazon Bedrock Knowledge BasesおよびModel Evaluationのワークフローは、構成が直感的でわかりやすく、結果の改善を理解するのに役立つ。価格設定は標準のAmazon Bedrock価格であり、英語コンテンツに最適化されているが、他言語のサポートも可能。

元記事: https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/