Title: Microdata in China
1Micro-data in China
- Sarah Cook
- Institute of Development Studies
- and
- James Keeley
- International Institute for Environment and
Development - University of Sussex
2The Study key questions
- Through extensive interviews and review of data
sources - What are the main national social science data
sources available through government agencies at
national and sub-national levels? - What other social science data sources are
available (e.g. sample surveys by academic
institutions, commercial survey groups, etc.)? - What are the current restrictions on access to
and use of social science data by overseas
researchers? - What are the options for negotiating access and
use of such data whether for public access or
on a case by case basis for specific projects?
3Outline of key issues
- Availability China collects good quality
national survey data, administrative data and
smaller sample surveys. - However, there is limited availability of meta
data or accompanying information. - The quality of data is reasonably high, although
problems can be identified. - The main issue is one of accessibility macro /
aggregate data is readily accessible but micro
data is extremely difficult for Chinese or
international scholars to use. - Various constraints and barriers to access can be
identified, but - opportunities exist for greater sharing within
China and internationally
4Chinas data collection system
- National Bureau of Statistics (NBS) responsible
for national statistics and national accounts - NBS is also the regulatory and supervisory body
for data collection and implementation of
National Statistics Law - Administrative data (collected by government
agencies) - Academic and research institutions (including
government, university, private) - Other (e.g. public opinion firms, market research)
5Key categories of micro-data
- censuses (e.g. population, economic, and
agricultural censuses) (NBS) - national sample surveys including 1 and 10/00
sample surveys of demographic change labour
force rural and urban households etc. (NBS) - administrative data reported by government line
agencies - surveys undertaken by government agencies, e.g.
Ministry of Agriculture (RCRE) longitudinal fixed
point survey Ministry of Health (often in
conjunction with NBS) - small scale sample surveys undertaken by
academic, government and other institutions.
Only a few have nationally representative
samples many are of limited value due to quality
of sampling or implementation.
6Examples of National Data Population, Labour
and Health
7Data are available but hard to access
- Most micro-data collected by government are
extremely hard and costly to access. - Easy access only to macro (aggregate) data
- creates problems even for general descriptive
overview e.g. lack of gender disaggregation). - Limited availability of good quality meta data.
- High transaction costs of negotiation of terms
and cost of use (including for Chinese users). - Limited interaction between users and producers
of data.
8Some data are more easily accessible
- Small sample surveys collected by leading
research institutions. - Data supported by international funders Terms
of funding often require open access (subject to
some conditions) - Numerous small-scale surveys exist but only a
limited number are of sufficiently good quality. - Those by leading institutions and international
funding tend to be high quality. - Are often accessible for small fee or through
personal relationships or through collaborative
project. - For example
9Data collected and held by CASS Institute of
Population and Labour Economics
10Internationally funded data sets
11Summary availability and access
- Almost all micro-data in China are difficult to
access statistical data, administrative data,
and even data collected by researchers. - The current environment is not conducive to
direct access to data, especially nationally
representative micro-data sets collected by NBS. - The major exception is research survey data
involving Chinese academics where there has been
international funding and where open access to
the data has been stipulated by funders.
12What are the barriers to access?
- Restrictions of the legal environment, in
particular, the content or interpretation of the
Statistical Law. - Specific conditions relating to providing social
science data to foreigners special permissions
needed from NBS if foreigners are to be involved
in a survey / data collection, or to be provided
with primary data. - Risk of making data available - potential
misuse of data for which data collectors may
ultimately be held responsible. - Attitudes towards data sharing and its public
use. - Data viewed as a marketable commodity not as a
public good.
13Barriers to access legal environment
- Laws and regulations are a critical issue both
for those seeking to access to micro-data
(users), and for those seeking to place data in
the public domain (producers). - Four main types of regulation exist which place
limitations on access to original data - rules on data protection and protecting the
anonymity of data subjects - rules on state secrets, endangering national
security or economic and social stability - rules on who can carry out surveys
- rules on releasing data to foreigners
- Lack of clarity means responsibility for misuse
of data by others could be blamed on collectors
of original data.
14Risks to data producers
- Grey areas or lack of clarity in relation to
what counts as sensitive information,
inappropriate use of information, and who can be
held responsible. - Data viewed as politically sensitive, for
instance, include school dropout rates and
unemployment data. - Result is that both government and academic
research community are risk-averse in sharing
information. - Researchers can present analysis based on primary
data in academic and government policy
communities, but this might become more sensitive
if picked up in the media and e.g. used to
highlight social problems. - Researchers aim to avoid negative repercussions
if analysis by a secondary user results in
media or other attention.
15Other barriers to access
- Data are not regarded as a public good (even when
publicly funded) data is often not shared even
among government or NBS departments. - Cultural attitudes place little value on sharing
data. - Competition among researchers or institutions for
publications and outputs. - Little recognition attached to production of good
quality data (e.g. if used by other researchers). - Data as a commodity use of data as an economic
resource to generate income. - Few incentives for cooperation e.g. in creating
data banks for shared use.
16Practical obstacles to making data public
- Lack of accessible information about what data
exists in particular fields. - Lack of good meta-data especially in English.
- Costs of preparing data for public use.
- Potential time for responding to questions by
other users and managing public access data. - High costs of translation of associated
descriptive materials or meta data. - No institutions or funding sources dedicated to
managing data in public domain or funding above
activities.
17Problems facing users
- Chinese and international researchers both face
problems of access. - Domestically there is limited sharing across
research institutions and limited access to
government or NBS data. - Especially for international researchers,
difficulties of accessing information about
existing data sets and related meta data. - Data use generally needs to be negotiated (and
paid for) on a case-by-case basis, with high
financial and transaction costs (even for major
actors such as World Bank). - Access depends principally on building
relationships and research collaboration with
researchers or institutes in China. - Even then, restrictions are often placed on use
which can make analysis more time consuming.
18Promising developments do exist
- CASS discussion of data / resource library.
- Beijing University (CCER) with support from
Michigan archive of older data sets. - ISDPP Beijing Normal University Social Policy
Analysis Information Center. - Other initiatives in areas of health and
population (with international funding).
19What more can be done?
- Create incentives for making data a public good
- Increase resources to document existing micro and
meta data in a consistent format - Strengthen the capacity and financing to clean,
document and manage data sets for greater
accessibility - Invest in institutional infrastructure such as
data archives for ease of access and sharing - Reducing the risks and clarify regulations on
data use and sharing - Funders conditions for making data public
20What can ESRC offer China?
- Possible financial or technical assistance.
- Respond to interest among Chinese institutes in
data management, preparation and related
services. - Identify and work with institutions concerned
with making data more open. - Training courses and sharing of training
materials. - Create easily accessible information / data base
of available data sets or contacts. - Provide access to international data sets of
interest to Chinese researchers. - Reciprocal arrangements with UK or EU research
institutions or data banks.