Title: The medGIFT project on medical image retrieval
 1The medGIFT project on medical image retrieval
- Medical Imaging and Telemedicine (MIT 2005)
 Henning Müller Medical Informatics Service  
 2Outline
- Geneva hospitals and medical informatics 
- Medical image retrieval 
- Why, how, what? 
- The medGIFT retrieval framework 
- MRML, system integration,  
- Image pre-processing 
- Needs analysis of medical image users 
- Retrieval system evaluation 
- ImageCLEF benchmarking event 
- Conclusions
3Hospitals and medical informatics 
 4Geneva University Hospitals
- 2,200 beds, 6 hospitals 
- 900 beds in the main clinic 
- 780,000 hospital days 
- 10,000 employees 
- 1,300 MDs 
- 22,000 operations per year 
- 30,000 images per day 
- 6,000 computers 
- Budget gt 1 billion/year 
- Research and teaching have high importance 
- Geneva is strong in bioinformatics, genetics, 
 neurosciences
- Service for medical informatics - management 
 informatics
5Medical Informatics Service
- 60 employees, part of radiology 
- vs. administrative informatics 
- 10 persons in research 
- Research areas 
- Multimedia electronic patient record 
- Decision support systems 
- Telemedicine, especially with African countries 
- Knowledge representation, natural language 
 processing, data mining
- Image processing, PACS, operation planning 
- Teaching 
- Postgraduate course in medical informatics 
- Virtual campus for medical students in medical 
 informatics
6Image Retrieval 
 7Image retrieval 
 8Content-based image retrieval
- Based on visual features and visual queries 
- Query by image example, query by sketch, query by 
 region
- Visual features include color histograms, texture 
 descriptors, shape descriptors, etc.
- But query formulation is difficult 
- Page zero problem for query by example 
- Now match visual features and semantics, try 
 object recognition of simple objects
9A medical example 
 10Global structure of retrieval systems 
 11Medical image retrieval Why?
- Increasing variety  amount of imaging in 
 medicine (diagnostics, treatment planning, follow
 up, )
- Hard to know everything extremely well 
- Currently, images are mainly accessed by patient 
 ID, used in a single context
- Much information stored in images and connected 
 text
- Little of this knowledge is exploited 
- Case-based reasoning and evidence-based medicine 
 need tools to integrate visual data as well
- Standardized methods less dependent of MDs 
 personal experience
12Medical image retrieval How?
- Create annotated datasets for real tasks such as 
 diagnostic aid (administrative burdens)
- To model expert knowledge 
- Infrastructures and database techniques exist 
- Web-based,  
- Visual features  classification/retrieval 
 techniques need to be optimized based on the
 problem
- Integrate all knowledge available for a case 
- Visual (several varied images), textual (release 
 letter, etc.), numerical (lab results)
- Include real users (feedback loops)
13Medical image retrieval What?
- Application for teaching 
- Help lecturers to find images 
- Help students to browse catalogs (continuing 
 education)
- Replace books? Same environment as in the 
 hospital
- Application in research 
- Optimize case selection for studies 
- Include visual features into studies 
- Visual data mining, visual knowledge management 
- Application as diagnostic aid 
- In specialized domains 
- Automation of processes 
- DICOM header correction, automatic annotation
14medGIFT 
 15The GIFT framework
- GIFT  GNU Image Finding Tool 
- Open source, free of charge, Linux 
- Techniques from text retrieval 
- Framework of components to avoid the 
 redevelopment of large parts for every project
- Web-based interfaces 
- MRML  Multimedia Retrieval Markup Language 
- Features can be plugged in, parameterized 
- Feedback schemes 
- Pruning methods, to allow interactive search 
- medGIFT add utilities, and integration into 
 medical applications
16Framework overview 
 17medGIFT
- http//www.sim.hcuge.ch/medgift/ (open source) 
- Project for content-based search in medical image 
 databases
- Goals of the project 
- Better management of visual medical data 
 (retrieval)
- Visual Knowledge Management 
- Textual and visual data 
- Diagnostic aid 
- Specialized retrieval (lung CTs, fractures, 
 dermatologic images)
- Access to PACS data 
- In the short term 
- Research, Teaching
18Interface
Query image
Diagnosis
Link to casimage
Similarity score 
 19Visual features
- Global color histogram (HSV, 18, 3, 3, 4 grey 
 levels)
- Color blocks at different scales and locations 
- Histogram of Gabor filter responses 
- 4 directions, 3 scales, quantized in 10 strengths 
- Gabor blocks at different scales and locations 
- 85,000 possible features, 1,000-3,000 features 
 per image, distribution similar to words in text
 collections
- Roughly Zipf distribution
20Weighting schemes
- Classical tf/idf 
- tf - term frequency 
- cf - collection frequency 
- j - feature number 
- Q - query with i1..N input images 
- k - possible result image 
- R - Relevance of an image in a query 
21Combination of visual and textual features
- EasyIR text search engine, also open source 
 (EPFL)
- Frequency-based techniques similar to gift 
- Stemming and stop work removal to improve 
 results, also for multilingual search
- Mapping to MeSH terms delivers few terms reliably 
 but high quality results
- Linear combination of normalized results of text 
 and visual system
- Depending on the query the optimal factors are 
 varying
22Relevance feedback
- One-image queries do normally not lead to very 
 good results
- Mainly false positives 
- Several input images improve the query quality 
 enormously
- Negative feedback is extremely important 
- Positive feedback is often reordering of 
 highest-ranked results
- But problems with too much negative feedback in 
 many systems
- Log files of a web demo allow to analyze user 
 behavior
- Learning of feature weightings as an additional 
 factor
- Long-term learning from the user interaction 
- Changes of feature sets during feedback 
- First tests promise good results
23Long-term learning
- Learn automatically from user interaction on 
 non-classified databases
- Log files from past interaction are used to 
 improve future results
- Images marked together by users in the same query 
 step are taken into account
- Positive, negative, neutral 
- Images marked together have something in common 
- Learning can include several levels (same user, 
 same database, same domain, )
24Using this as additional factor for weighting
- Learning on feature not on image basis is the 
 goal
- Positive and negative feature occurrences 
- Additional factor in the frequency-based 
 weighting for each feature
- With much feedback a pure probability approach 
 might be possible, as well as on an image level
- Results are improved significantly, although web 
 demo is not reliable
25Casimage  a radiological case database
- Case database for teaching 
- http//www.casimage.com/, interface developed 
 with the proprietary 4D software
- gt65,000 images, 9,000 images externally 
 accessible, 500 added per week
- Case descriptions (textual) available in XML 
- Very varying quality 
- Mix of French and English 
- Interface is compatible to the MIRC (Medical 
 Image Resource Center) standard of the RSNA
26GIFT/casimage 
 27GIFT integration
- medGIFT -gt casimage 
- Simple link from image to case 
- Important to get info on images 
- Casimage -gt medGIFT 
- Constraint no change of a running routine 
 application of the hospital
- Simple button under an image with a link opening 
 a new browser window
- PHP interface traces address and downloads the 
 images, then executes a query
28Image pre-treatment 
 29Lung segmentation
- Concentrate visual search on animportant region 
 of the image
30Lung block analysis and classification
- Segmentation of the lung 
- Cutting of the lung into blocks 
- Feature extraction from blocks 
- Classificiation of blocks into several classes 
 (8 in our case)
- Learning database containing 112 annotated 
 regions (1000 blocks of size 32x32)
- Features Cooccurence matrices, Gabor filters, 
 grey level histograms,
- SVMs reach 84 accuracy healthy/non-healthy, 85 
 into 8 classes
31Another problem Noise around object
Hospital logo
Text in the images
Specific problems
Large regions with no information 
 32Object extraction
- Mostly small structures with high frequencies 
- Object in the center, one large connected 
 component
- Remove certain objects specifically (logo, grey 
 square)
- Remove small structures 
- Query only on the image object
33Object extraction steps 
 34Object extraction examples 
 35User needs 
 36User needs
- How to find out what the user really needs? 
- They will not tell you by themselves 
- Future use of images in medicine 
- HON (health on the net) media search 
- Log files from the web search engine 
- Mainly patients searching for information 
- Surveys among various medical professionals 
- Students, librarians 
- Clinicians, researcher, lecturers 
- Survey at OHSU and Geneva among 33 persons 
- Practical experiences when dealing with a PACS
37Log file analysis of HONmedia search
- http//www.hon.ch/HONmedia 
- 2000 searches per month 
- Preliminary results (Jan 2005) 
- More French than English (2/1), mainly 1-3 words 
- Mostly diagnosis and anatomic region, sometimes 
 combined
- Leukemia, tumeur glomique, fracture,  
- Many general questions 
- Childbirth, medical images, medical media,  
- Also XXX
38Analysis of survey Questions
- For which tasks are images useful for you? 
- What type of images do you use for each task? 
- Where and how do you search images 
- How do you define whether an image found is 
 relevant or not?
- What kind of search would be useful for you 
- Separately for the following areas research, 
 clinics, lecturer, student, librarian
- 18 participants in Geneva, 15 in Portland (OHSU) 
- Mainly research/clinician/lectures together
39Analysis of survey first results
- Tasks are extremely differentt depending on 
 department, specific work, and experience
- Mostly diagnostics and conference presentations 
- In diagnostics mainly radiographs and much CT, 
 for research and teaching CTs and illustrations
- Most research in the PACS, but frequently in 
 google, our teaching file, and on specialized
 pages
- Relevance is defined by experience, problems on 
 the web with bad resolution/quality
- Most wanted a search by pathology added and the 
 possibility to find similar cases to a current
 patient
40Performance Evaluation 
 41Overview image retrieval benchmarks
- Birds-I, Benchathlon 
- SPIE Electronic Imaging 
- Personal proposals 
- C. Leung,  
- ImageEval 
- French, only 
- ImageCLEF 
- Cross Language Evaluation Forum 
- Four tasks in total, two medical tasks for image 
 retrieval and classification
42CLEF and ImageCLEF
- Located at the Cross Language Evaluation Forum 
 (CLEF)
- Goal is to evaluate the retrieval of images 
 through multi-lingual information retrieval
- And not necessarily based on image information 
- 2003 a first image retrieval task with 4 
 participants
- Queries in different languages than the English 
 collection annotation, image is part of the query
- 2004 17 participants for two tasks (200 runs) 
- Medical task for visual image retrieval added 
 where the query topic is an image, only, and the
 text is English/French mixed
- Evaluation of interactive image retrieval 
- 2005 24 participants for four tasks, gt300 runs, 
 36 inscriptions
- Medical retrieval and classification tasks
43ImageCLEF 2005 examples
Show me x-ray images with fractures of the 
femur. Zeige mir Röntgenbilder mit Brüchen des 
Oberschenkelknochens. Montre-moi des fractures du 
fémur. 
Show me chest CT images with emphysema. Zeige mir 
Lungen CTs mit einem Emphysem. Montre-moi des CTs 
pulmonaires avec un emphysème. 
Show me any photograph showing malignant 
 melanoma. Zeige mir Bilder bösartiger 
Melanome. Montre-moi des images de mélanomes 
malignes. 
 44ImageCLEF results
- Resources 50,000 images for retrieval and 10,000 
 images for classification
- Annotation in English/French/German 
- Query includes text and 1-3 images 
- 3 types of queries (visual, mixed, semantic) 
- Average results are better using text than 
 images, best results are textvisual
- 130 runs submitted, mostly mixed, little feedback 
- Best result IPAL/I2R (map 0.2821) 
- Best visual map 0.1455, best textual map 0.2084 
- Results vary extremely over queries 
- Classification task 87.4 best rate for 57 
 classes
45Conclusions 
 46Conclusions
- Content-based medical image retrieval can become 
 important in teaching, research and diagnostics
- To use the inherently stored knowledge of images 
- Integration of various data sources and images 
- More is needed than technical solution 
- Users need to be included in the development 
- Hospitals need to work with computer science 
 researchers (more communication)
- Standardized evaluation is needed to identify 
 promising techniques
47Questions?