Title: Optical Character Recognition
1Optical Character Recognition
- Proposer Peeyush Bajpai
- Name of the company Indicus NetLabs Private
Limited - Language/Language pair English to Hindi
- Category of contribution Development and Spread
of OCR in Hindi
2Strengths
Organizational
Organization
Technical Capabilities
Indicus
Focused Dedicated Team
Past Experience
3Organization
Indicus Analytics
Indicus NetLabs
- Part of the development process
- Facilitating the process for everyone to gain
- Entrepreneurial in our aiding development
activities - Credibility
- Media Coverage
- Client Orientation
- Highly exacting clients in both software and
research - Working with Researchers (Academics and Policy
Makers)
- Focus on Indian Language Technology
- Facilitating Indias Development
- Passion to make a difference
4Business Volume/ Financial Details
Indicus Analytics
- Indicus Netlabs Pvt. Ltd. has been spun off
recently from Indicus Analytics - Indicus Analytics has been in existence for 5
years - Five fold increase in revenues in as many years
- Revenues for 2004-05 gt Rs 1 Cr.
- Consistently increasing profitability (net
profit/revenues) - Many IT related studies
- Software for gauging competetiveness of SSIs -
UNDP/FICCI - Consumer Tracker - Purchaed by Maruti, Bata,
Parle ITC etc. - Market Skyline Purchased by more than 300
corporates - Indicat socio-economic data analyser and
mapping software
5Technical Capabilities
- Over 15 years individual experience in Software
Industry - Over 5 years experience in research oriented work
- Over 5 years experience in database analysis,
management and development - First Integrated Hindi Search Engine- www.
Raftaar.com - Comprehensive understanding of language related
complexities - Experience across various technical platforms,
languages and environments
6OCR Development Methodology
OCR Engine
Input
Output
Page Layout Analysis Engine
Scanning
Training and Testing Data
Export
document
METS/TEI PDF TIFF, JPEG
Visual Component Extraction Engine
Visual Component Recognizer Engine
Import
Post Processor Engine
Language Corpus
RulesDatabase
7OCR Development Methodology
8OCR Development Capabilities
- Understanding of Fonts and the associated glyphs
- Experience in mapping of all the alphabets for
Devanaagari (Hindi) for majority of the currently
used fonts - Development of tool to understand PDF based
information - Experience in working with Unicode and INSROT
- Experience in developing a corpus in Hindi which
is in Unicode. Our current web crawlers are
continuously updating the corpus.
9Decision Process
- Information Collation
- Brain Storming with Team
- Decision Alternatives
- Decision taken by assigned entity
- Technical P. Srinivasan
- Operational P. Srinivasan, Peeyush Bajpai
- Organizational Peeyush Bajpai Laveesh Bhandari
- Financial Laveesh Bhandari
10Manpower Involved
- Project Advisor Dr. Laveesh Bhandari
- Overall Operations Peeyush Bajpai
- Technical Development P. Srinivasan
- Development Team A combination of software
engineers researchers - Coordination Mamtesh Kumar
- Business Development Peeyush Bajpai
- Media Networking Kapila Chaplot
11Marketability
12Marketability
- Product/ Service
- Demand
- Value Proposition
- Client Orientation
- 5 Ps of marketing
- Credibility
- GOI
- IIT
- CDAC
- Indicus and its networks with policymakers,
media, and corporate
- Many of our studies have been released and
referred to by eminent Indians including the
President Dr. APJ Abdul Kalam, the Vice-President
Shri Bhairon Singh Shekhawat, the Prime Minister
Dr. Manmohan Singh, the Finance Minister Dr. P.
Chidambaram, the Panchayati Raj Minister Shri
Mani Shankar Aiyer, former Deputy Prime Minister
Shri L.K.Advani, and many others
13Marketability
- Media Coverage
- Hindi Dailies Dainik Jagran, Hindustan,
Jansatta, Prabhat Khabar - English Dailies Indian Express, Telegraph,
Chronicle, Times of India - Business Dailies Economic Times, Business
Standard, Financial Express - Magazines India Today, Outlook
- Past Clients (one example each)
- The Government The 12th Finance Commission
- The Media India Today
- International Academia Stanford University
- Development Institutions The World Bank
- International Aid Organizations DFID, Government
of UK - NGOs Liberty Foundation
- Indian Research Institutions Rajiv Gandhi
Institute of Contemporary Studies - Networks 3I Network (IIM A, IIT (Kanpur) and
IDFC) - Companies Hindustan Lever
- Associations Confederation of Indian Industry
14Interaction with Universities/ RD Institutions/
Academics
- Universities
- Harvard University
- Stanford University
- University of Delhi
- University of New Castle
- University of Texas
- Social Science Research Centre, Berlin
- University of East Anglia
- Maryland University
- University of California (Santa Cruz)
- Development Institutions
- The World Bank
- USAID
- DFID
- UNDP
- Indian Research Institutions
- Rajiv Gandhi Institute of Contemporary Studies
- 3I Network of IIM (Ahmedabad), IIT (Kanpur) and
IDFC
15Raftaar.com(First integrated search engine in
Hindi)
16Raftaar.com(First integrated search engine in
Hindi)
17Raftaar.comFirst integrated search engine in
Hindi
- A simple user interface to type in Hindi
- (A primary school dropout can also use)
- Font Hassle Free Gets information from sites
ir-respective of the font - Regular update of index for latest and most
relevant information