Edit this page

NA-MIC Project Weeks

Back to Projects List

claude-scientific-skill for Imaging Data Commons

Key Investigators

Project Description

Agent Skills are folders of instructions, scripts, and resources that agents can load when relevant to perform specialized tasks. claude-scientific-skills introduces a development pattern to describe such skills for tools and resources usable in scientific research via a human-readable document, accompanied by code samples and recipes covering key functionality of the resource. Further, the company maintaining that repository makes the resulting skills accessible via MCP server, which could be connected with an agentic dev platform to improve quality of responses. The goal of this project is to add a new skill to the aforementioned repo to cover Imaging Data Commons.

Objective

  1. IDC skill is available in https://github.com/K-Dense-AI/claude-scientific-skills
  2. Feedback and use cases collected from the community

Approach and Plan

  1. Analyze existing skills, understand best practices.
  2. Develop the IDC skill.
  3. Submit a PR with the IDC skill.
  4. Compare responses of LLM with and without using the claude-scientific-skills MCP server.
  5. Evaluate usability of the skill using the questions from IDC forum, or any other questions from the community.

Progress and Next Steps

  1. Set up Claude.AI with the claude-scientific-skills MCP server, experiment.
  2. Started setting up the skill layout and deciding what should be covered.
  3. Submitted PR with the initial skill: claude-scientific-skills PR #35 (this has now been merged!)
  4. Published standalone skill https://github.com/ImagingDataCommons/idc-claude-skill
  5. Discussed with Mike (who merged it with his own skill for IDC!); suggestions for improvement:
    • keep the main skill small, break out details into references
    • look into Mike’s skill for managing skill versioning
    • need to investigate improvements to how IDC BigQuery parquet files are organized, noted in https://github.com/ImagingDataCommons/etl_flow/issues/130
    • need to work on idc-index improvements: publish indices in GCS bucket (idc-index #229), add radiomics features table (idc-index #230), support search via remote parquet file (idc-index #331.
    • it is a known issue that Claude struggles dealing with too many skills (incorrect skill matching etc)
  6. Tested to address use case from Leo (how many CT scans does NLST have, how many of those are segmented with TotalSegmentator); lessons learned:
    • GPT-4o model is useless (eg, within the same response corrects its own mistake in one code snippet but not the other)!
    • GPT-5-codex was able to answer the questions correctly, supported by correct python code, from the first try
  7. Discussed usage issues with K-Dense-AI devs (via slack)
    • they are aware of Claude struggling when number of skills grows over 300 (although in my experience, even 140+ seems to be too much already, at least when accessed via their MCP)
    • discussed issues related to managing skill independently vs as part of their repo
    • see https://k-densecommunity.slack.com/archives/C09RL3JRBSB/p1769554262310839 to join the discussion and learn more

Illustrations

No response

Background and References

Coding agents learning materials:

Mike Halle’s IDC skill (works with Claude Code and Claude platform/web/mobile)