Title: MultiLanguage Extensibility in MS SQL Server
1Multi-Language Extensibility in MS SQL Server
- James Hamilton
- JamesRH_at_microsoft.com
- Microsoft SQL Server
- 2002.06.25
2Agenda
- Common Language Runtime (CLR) Integration
Overview - Basic infrastructure
- Design philosophy
- 4 Ss Safety, Security, Scalability, Speed
- SQL Features Enabled
- CLR assemblies
- Scalar functions
- Relational functions
- Aggregate functions
- Stored Procedures
- Triggers
- Types and Methods
- SQL Types Package
- Access methods
- Systems Integration issues
3Basic Infrastructure
- CLR Hosted inside SQL Server
- 4Ss safety, security, scalability, speed
- Run verified, type-safe code in process
- .NET Framework languages (VB, C, Java, )
- Competitive offerings
- Oracle DB2 in-proc Java JDBC
- Data access in process
- Based on ADO.NET
- Same programming model as middle-tier
- SQLTypes support
- SQL type semantics in managed code on client
server tiers
4Philosophy Safety Security
- Safety
- User code does not compromise integrity of
server process - Verifiable code
- Leverage CLRs code access security
- User code cannot call UI, create threads,
synchronization, or call unmanaged code - Security
- Access to SQL data from user code via SQL
authorization model - Access to system resources from user code via
.NET Framework code permissions - Administrators control permissions given to
assemblies
5Philosophy Security model
- Resources protected by permissions
- threads, files, unmanaged code access etc.
- API that exposes a resource
- introduces a demand for that permission
- Stack-walk based permission check
- every assembly in call-stack has permission
- Permissions granted to assemblies determined by
machine and user-level security policy - Allows host to add another policy layer to
further restrict permissions
6Philosophy 3 Permission Sets
- SAFESQL
- Internal computation plus data access
- No access to resources outside SQL Server
- No unmanaged calls
- Must be verifiable
- EXTERNAL ACCESS
- SAFESQL access to external resources
- Requires EXTERNAL ACCESS permission to create
- SQL Server will impersonate the caller
- Must be verifiable
- UNRESTRICTED
- No controls can call unmanaged, un-verifiable
- Only Sysadmin can create
7Philosophy Scalability Speed
- Scalability
- As many concurrent users as TSQL
- Integrated SQL and runtime threads
- Collaboration between SQL and GC
- Speed
- Efficient data access in process
- Compiled user code, not interpreted as TSQL
- Fast transitions in/out of runtime
8Philosophy Functions speed
- .NET functions approximating speed of TSQL inline
expressions - .NET Framework functions much faster than TSQL
functions for complex expressions
9Agenda
- CLR Integration Overview
- Basic infrastructure
- Design philosophy
- 4 Ss Safety, Security, Scalability, Speed
- SQL Features Enabled
- CLR assemblies
- Scalar functions
- Relational functions
- Aggregate functions
- Stored Procedures
- Triggers
- Types and Methods
- Access methods
- SQL Types Package
- Systems Integration issues
10Development Steps
VS .NET Project
Runtime hosted inside SQL
11Assembly Registration
- CREATE ASSEMBLY lib_geom FROM \\m1\types\geometry
.dll - WITH PERMISSION SET SAFE WITH AUTOREGISTER
- DROP ASSEMBLY lib_geom
- Assemblies stored in DB
- Backup, restore, etc. with data
- Code permissions assigned per assembly
- SafeSQL, external access, unrestricted
- Autoregister functions
- Using .NET custom attributes
- Assembly benefits
- Self-describing metadata types file
dependencies - Unit of deployment permissions versioning
12ALTER Assembly
- Schema bound
- Cannot invalidate persistent data or indexes
- No tables with columns of UDT from this assembly
- No indexes on functions of that assembly
- Packaging considerations
- Place routines and types in different assemblies
13Register A Function
- CREATE FUNCTION distance (
_at_x1 int, _at_y1
int, _at_x2 int, _at_y2 int ) RETURNS float
EXTERNAL NAME lib_geomCPoint.Distance
DETERMINISTIC
RETURNS NULL ON
NULL INPUT - DROP FUNCTION distance
- Functions called from queries
- Static class functions
- Deterministic functions
- No SQL updates or access to global state
- Can be scalar or table-valued
- Using a function in a query
- SELECT s.name FROM Supplier s
- WHERE dbo.distance( s.x, s.y, _at_x, _at_y )
14Call Function
SELECT name FROM Employee WHERE dbo.tax(salary)
50000
SQLSERVR.EXE
Common Language Runtime
Managed code
Unmanaged code
15Register A Procedure
- CREATE PROCEDURE check_inventory
- EXTERNAL NAME lib_eventsCInventory.check_level
- DROP PROCEDURE check_inventory
-
- Procedures called directly
- Can contain SQL queries, updates, or DDL
- Can return results directly to client
- Not directly callable (as functions are) from
queries
16Register A Trigger
- CREATE TRIGGER supplier_event ON supplier
- AFTER INSERT, UPDATE
- EXTERNAL NAME lib_eventsCNotif.Supp_Event
- DROP TRIGGER supplier_event
- Similar to procedures
- with access to inserted deleted tables
17UDT Introduction
- Extends SQL Server type system
- Complex structure associated behavior
- Managed classes in any CLR language
- Functionally subsume SQL-99 distinct types
- equivalent to structured types
- Other products Oracle Cartridges, Informix
DataBlades, IBM DB Extenders
18UDT Ordering values
- Two kinds of ordering supported
- Binary ordering indicated via CLR property
- public const bool IsByteOrdered
- Operator-based ordering
- Overloaded comparison operators
-
- public static SQLBool operator
- (, )
19UDT Creating
- Registered as a type from an already registered
assembly - CREATE ASSEMBLY MapLib
- FROM \\mysrv\share\MapLib.dll
- CREATE TYPE Point
- EXTERNAL NAME MapLibBasetypes
-
20UDT Instantiating
- Can be declared as column type
-
- Create table Cities(
- Name varchar(20),
- State varchar(20),
- Location Point DEFAULT new Point(0,0))
- Variables params of T-SQL .NET routines can
UDTs
21UDT Read operations
- Registered methods, properties, public data
members can be use in SQL queries - Methods assumed to be non-mutators unless marked
using custom attribute - Only non-mutators allowed in SELECTs
- Declare _at_p
- Set _at_p new point(32, 23)
- Select LocationDistance(_at_p)
- From Cities
22UDT Ordering Operations
- Indexing, UNIQUE, PRIMARY KEY constraints
- Requires binary ordering support
- Inconsistent operator based ordering can cause
corrupt indices, incorrect query results - ORDER BY, GROUP BY, comparison operators
- If UDT supports binary ordering, always use
binary ordering - Else use overloaded operators rely on
consistency and correctness of UDT implementation
23UDT UDT Write Operations
- Constructor can be called using new operator
- UPDATE Cities
- SET Location new Point(12.3, 46.2)
- Properties and public data members can be
modified through assignment - UPDATE Cities
- SET LocationX 23.5,
- LocationY 23.5
- WHERE Name Anchorage
24UDT Write Operations
- Methods marked mutators callable in UPDATEs
- UPDATE Cities
- SET LocationSetXY(23.5, 23.5)
- WHERE Name Anchorage
- INSERTs get values from string, binary form or
from constructor invocation - INSERT Cities(Anchorage, Alaska, 19.32.3)
25UDAggs Required Methods
- Implement QP interface to aggregate values over a
group - Interface needed
- Initialize a group
- Pass values in the group to accumulate the
aggregation - Merge multiple groups (for parallel plans)
- Finalize computation and retrieve result
26Creating UDAggs
- Aggregates implemented as set of methods packaged
in a .NET class - UDAgg is bound to a class in existing assembly
- CREATE AGGREGATE Concat(nvarchar)
- RETURNS nvarchar
- EXTERNAL MyLibConcat
27UDAgg class definition e.g.
- Public class Concat
- //Private store for accumulating results.
- private SQLString agg
-
- //Optimizer properties
- public static bool IsNullOnNullsSet() return
true - public static bool IsNullOnEmptySet() return
false -
- //Aggregation interface
- public void Init()
- public void Accum(SQLString str)
- public void Merge(Concat otheragg)
- public SQLString Term()
-
-
28Final Stage Access Methods
- Complete extensibility solution requires user
defined access methods - Informix supports adding access methods
- Expose locking, concurrency control, recovery,
etc. - Hard to make work in general few engineers able
to write internal storage engine code well - Many will try and they will have bugs hard on
product quality image - Solution Tailor existing access methods
- E.g. map spatial queries to queries over 2
B-trees or single Z-transform - Query rewrite language such that system can
rewrite a function as an appropriate set of
operations over 1 or more tables/indexes
29CLR SQL Types Package
- Defines C, VB, Java types corresponding SQL
Server types - Reduce impedance mismatch between programming
language data - Consistent expression evaluation in mid-
server-tier programming - SQL Types library
- Managed classes system.Data.SQLTypes
- Provide SQL semantics
- Nullability, three-valued logic
- Precision scale in operations
30SQL Types Example
- Tax function implemented with SQL types
using System
using System.Data.SQLTypes
public class myFinances
public static
SQLDouble tax( SQLDouble sal )
if ( sal 0.15 if (
sal 50000.0 sal 0.23 else return sal 0.35
31Agenda
- CLR Integration Overview
- Basic infrastructure
- Design philosophy
- 4 Ss Safety, Security, Scalability, Speed
- SQL Features Enabled
- CLR assemblies
- Scalar functions
- Relational functions
- Aggregate functions
- Stored Procedures
- Triggers
- Types and Methods
- Access methods
- SQL Types Package
- Systems Integration issues
32Integration In-Proc Data Access
- Goals
- Symmetric model
- Client, server, mid-tier
- Fully supported by dev tools
- VS dev environment including debug
- In same address space as server
- Dont marshal to TDS (tabular data stream)
- Dont loop-back through network interface
- Avoid unnecessary copies
- Avoid unnecessary transitions across
managed/unmanaged (into VM) interface
33Integration Process Model
- Thread integration
- SQL Server non-preemptive user-level thread
scheduling - Fibers multiplexed on O/S thread with thread
migration - Implication TLS storage used by VM wont work in
SQL execution environment - Lazy preemptive
- When SQL calls potentially blocking code it must
go preemptive (allocate a thread) - So thread required for each VM call expensive
- Lazy pre-emptive assume preemption not required
and set timer to catch rare cases when thread
allocation is needed - Garbage collection blocks VM threads
- Hard on multi-user DB throughput
- GC doesnt directly suspend calls DB to suspend
and DB schedules other non-VM hosted work
34Integration Memory Management
- Memory allocation from DB system (rather than via
O/S) - Allows memory resources to be used for different
purposes over time - Garbage collection very general but DB systems
often have more info - e.g. free memory pool at end of statement
- Goal
- DB memory costing able to request memory from VM
when needed - Per user memory consumption tracking quotas
35Integration Remaining Issues
- Assemblies (code) kept in database
- Dispatched to VM through buffers
- Stream interface would be better yet
- Support attention propagation (user interrupt)
into VM - Track all VM resources
- O/S portability layer tailored to DB execution
environment - Error containment/minimize multi-user impact of
failures - Out-of-proc execution
- Multiple VMs
36Integration Query optimization
- Gather function cost statistics
- Value histograms, execution cost
- Reorder of predicate evaluation
- Based on cost of evaluation
- Function indexes
- Speed up expensive functions
- Extends computed column indexes and indexed
(materialized) views
37CLR Integration Summary
- DB extensibility without product stability risk
- Ability to add types and behavior w/o access to
engine source code - Scalable, safe, secure fast
- Rich server programming model
- Any CLR language
- Symmetric mid- server-tier model
- Better development environment
- Integrates SQL tools with Visual Studio