Title: Research Metrics What was proposed what might work Jonathan Adams
1Research MetricsWhat was proposed what
might workJonathan Adams
2Overview
- RAE was seen as burdensome and distorting
- Treasury proposed a metrics-based QR allocation
system - The outline metric model is inadequate,
unbalanced and provides no quality assurance - A basket of metrics might nonetheless provide a
workable way of reducing the peer review load - Research is a complex process so no assessment
system sufficient to purpose is going to be
completely light touch
3The background
- RAE introduced in 1986
- ABRC and UGC consensus to increase selectivity
- Format settled by 1992
- Progressive improvement in UK impact
- Dynamic change and improvement at all levels
4The RAE period is linked to an increase in UK
share of world citations
5UK performance gain is seen across all RAE
grades (Data are core sciences, grade at RAE96)
6Treasury proposals
- RAE peer review produced a grade
- Weighting factor in QR allocation model
- Quality assurance
- But there were doubters
- Community said the RAE was onerous
- Peer review was opaque
- Funding appeared too widely distributed
- Treasury wanted transparent simplification of the
allocation side
7The next steps model
- Noted correlation between QR and earned income
(RC or total) - Evidence drew attention to statistical link in
work on dual support for HEFCE and UUK in 2001
2002 - Treasury hard-wired the model as an allocation
system - So RC income determines QR
- But
- Statistical correlation is not a sufficient
argument - Income is not a measure of quality and should not
be used as a driver for evaluation and reward
8QR and RC income scale together, but the residual
variance would have an impact
HEPI produced additional analyses in report
9Unmodified outcomes of outline metrics model
perturb current system unduly
A new model might produce reasonable change, but
few would accept that the current QR allocations
are as erroneous as these outcomes suggest
10The problem
- The Treasury model over-simplifies
- Outcomes are unpredictable
- There are confounding factors such as subject mix
- Even within subjects there are complex cost
patterns - The outcome does not inspire confidence and would
affect morale - There are no checks and balances
- Risk of perverse outcomes, drift from original
model - Drivers might affect innovation, emerging fields,
new staff - There is no quality assurance
11What are we trying to achieve?We want to lighten
the peer review burden so we need indicators to
evaluate research performance but not
simplistic mono-metrics
What we want to know
research quality
Research black box
Inputs
Outputs
Time
Time
Funding
Numbers..
Publications
What we have to use
12Informed assessment comes from an integrated
picture of research, not single metrics
13Data options for metrics and indicators
- Primary data from a research phase
- Input, activity, output, impact
- Secondary data from combinations of these
- e.g. money or papers per FTE
- Three attributes for every datum
- Time, place, discipline
- This limits possible sources of valid data
- Build up a picture
- Weighted use of multiple indicators
- Balance adjusted for subject
- Balance adjusted for policy purpose
14We need assured data sourcing
- Where the data comes from
- Indicator data must emerge naturally from the
process being evaluated - Artificial PIs are just that, artificial
- Who collects and collates the data
- This affects accessibility, quality and
timeliness - HESA
- Data quality and validation
- Discipline structure
- Game playing
15We need to agree discipline mapping What is
Chemistry?
16We have to agree how to account for the
distribution of data values e.g. income
Maximum
Minimum
17Distribution of data values - impact
The variables for which we have metrics are
skewed and therefore difficult to picture in a
simple way
18Agree purpose for data usage
- Data are only indicators
- So we need some acceptable reference system
- Skewed profiles are difficult to interpret
- We need simple, transparent descriptions
- Benchmarks
- Make comparisons
- Track changes
- Use metrics to monitor performance
- Set baseline against RAE2008 outcomes
- Check thresholds to trigger fuller reassessment
19Example - categorising impact data
This grouping is the equivalent of a log 2
transformation. There is no place for zero
values on a log scale.
20UK ten-year profile 680,000 papers
MODE (cited)
AVERAGE RBI 1.24
MODE
MEDIAN
THRESHOLD OF EXCELLENCE?
21Subject profiles and UK reference
22HEIs 10 year totals 4.1
Smoothing the lines would reveal the shape of
the profile
23HEIs 10 year totals 4.2
Absolute volume would add a further element for
comparisons
24Conclusions
- We can reduce the peer review burden by increased
use of metrics - But the transition wont be simple
- Research is a complex, expert system
- Assessment needs to produce
- Confidence among the assessed
- Quality assurance among users
- Transparent outcome for funding bodies
- Light touch is possible, but not featherweight
- Initiate a metrics basket linked to RAE2008 peer
review - Set benchmarks thresholds, then track the
basket - Invoke panel reviews to evaluate change, but only
where variance exceeds band markers across
multiple metrics
25Overview (reprise)
- RAE was seen as burdensome and distorting
- Treasury proposed a metrics-based QR allocation
system - The outline model is inadequate, unbalanced and
provides no quality assurance - A basket of metrics might nonetheless provide a
workable way of reducing the peer review load - But research is a complex process so no
assessment system sufficient to purpose is going
to be completely light touch