Friday, October 29, 2010

PGDay LA - Call for Talks


PGDay is scheduled at this year's Southern California Linux Exposition (SCALE) held in the LAX Hilton Hotel in the city of
Los Angeles, California, on Friday February 25th, 2011. It will consist of one full day of talks about PostgreSQL.

We are looking for talks designed for a general audience of web developers, sysadmins, DBAs and open source users. Talks should have significant technical content. This includes:
  • Interesting PostgreSQL Case Studies
  • Solutions to common problems
  • Performance, scaling and optimization
  • Administration
  • Using PostgreSQL with popular languages or tools
  • Replication and clustering
Talks should be 50 minutes in length.  We are also accepting 5 minute "lightning talks".
 
Talks should be submitted by January 5, 2011.  Final speakers will be chosen by January 15, 2011.
 
Post your talk submission to pgday-submissions@googlegroups.com.
 
Submissions should include:
  • Speaker Full Name
  • E-mail
  • Cell Phone
  • Title of Talk
  • Description of Talk
  • Notes for Committee
For more information visit: PGDay LA

Friday, June 4, 2010

Fun With Updatable Views

Views are useful tools when wrapping away complicated SQL statements or limiting access to sensitive data.

Directing attention specifically to updatable views, while PostgreSQL is still working toward implementing the official syntax for updatable views, most of the functionality described in the standard can be achieved when adding DO INSTEAD rules to a views.

If one keeps to the design constraints imposed by the standard, these views can be used reliably and in harmony with the RDBMS. Here is a partial listing of constraints:
  1. The view can be derived from at most one table. This means no relations that are the results of joins or unions.(intersection and exceptions can be okay though.)
  2. No aggregation.
  3. Any excluded columns must be nullable or have a default value.

So far so good, but where's the fun in having updatable view?

Well, on my second read-though of SQL for Smarties, I ran across a suggested "good practice" to constrain all values that are part of an attribute domain to a single system of units. For example, all measurable values "should" have units defined by either the SI or the English unit system.

Two additional suggestions were to further constrain attributes of a certain measure to use only the system's primary unit types (i.e. kilograms as opposed to grams), and where possible use non-divisible units as opposed to composite units. An example is to store distance and time in different attributes instead of a single attribute for speed (distance/time).

So, what useful purpose can an updatable view provide where an entire schema is constrained to a single system of units?

Let's say there were two teams working on the same data. The first team only works in the metric system, the other only works in the english system. An updatable view can be used as a way to translate the data from one unit system to another so that each team sees the data in the unit system that they're familiar with.

(The use of views as a unit translation mechanism brings up an important concern regarding floating point data and fixed point data which can be represented using PostgreSQL's real, double precision, or numeric data types. Conversions and calculations introduce rounding errors in these types. So an important design consideration would be to allow for sufficient precision to reduce the effect of rounding errors giving consideration to the usable range of the specific problem domain. To illustrate in the case of the numeric data type, data collected for a machinist would require setting a precision and scale for accurate small fractional values (for example numbers between 0.000000 to 99.999999), while data collected for a surveyor would require setting a precision and scale for large numbers (for example numbers between between 0.0 and 9999999.9.) The following links are useful references for non-exact data types:

http://en.wikipedia.org/wiki/Fixed-point_arithmetic#Precision_loss_and_overflow
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems )

Getting back to the topic at hand, here is an example that demonstrates the technique of using an updatable view in PostgreSQL:

First let's start by creating two schema's, one for relations having data in the SI unit system and one for relations in the English unit system. Also throughout the schema design, let's add some comments so that we can document our design intentions.

CREATE SCHEMA si;


COMMENT ON SCHEMA si IS


'The si schema holds all that is metric
i.e. Newtons(KgM/S^2), Kilogram (Kg), Meter (M), Second (S)';


CREATE SCHEMA eu;



COMMENT ON SCHEMA eu IS

'The eu schema presents data in the english system

i.e. Pounds(lbs), Feet(ft), Seconds(s)';



Now that we have our schema, we need to decide on the unit system we'll base our tables on. Let's pick the SI system.


SET search_path to si;


CREATE TABLE si.widgets(

widget_nbr integer NOT NULL,

mass numeric(9,5) NOT NULL
DEFAULT 0.000, -- stored as Kilograms


length numeric(9,5) NOT NULL
DEFAULT 0.000, -- stored as Meters.


width numeric(9,5) NOT NULL
DEFAULT 0.000, -- stored as Meters.


height numeric(9,5) NOT NULL
DEFAULT 0.000, -- Stored as meters


CONSTRAINT widgets_pkey
PRIMARY KEY (widget_nbr),


CONSTRAINT valid_widget_depth_check
CHECK (height >= 0.000),


CONSTRAINT valid_widget_length_check
CHECK (length >= 0.000),


CONSTRAINT valid_widget_mass_check
CHECK (mass >= 0.000),


CONSTRAINT valid_widget_width_check CHECK (width >= 0.0000));




COMMENT ON COLUMN si.widgets.mass
IS 'stored as Kilograms';


COMMENT ON COLUMN si.widgets.length
IS '
stored as Meters';

COMMENT ON COLUMN si.widgets.width
IS '
stored as Meters';

COMMENT ON COLUMN si.widgets.height
IS '
stored as Meters';


Next we'll create functions to preform the data-type conversion in the view definition. (Note these functions are for illustrative purpose only. :) )


CREATE FUNCTION eu.get_feet_from_meter(metervalue numeric)
RETURNS numeric AS
'SELECT $1 * 3.280839895'
LANGUAGE 'sql' IMMUTABLE STRICT;


CREATE FUNCTION eu.get_meter_from_feet(feetvalue numeric)
RETURNS numeric AS
'SELECT $1 * 0.3048;'
LANGUAGE 'sql' IMMUTABLE STRICT;

CREATE FUNCTION eu.get_lbs_from_kg(kgvalue numeric)
RETURNS numeric AS
'select $1 * 2.2046226218;'
LANGUAGE 'sql' IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION eu.get_kg_from_lbs(lbsvalue numeric)
RETURNS numeric AS
'SELECT $1 * 0.45359237;'
LANGUAGE 'sql' IMMUTABLE STRICT;



Now that we have the conversion functions we can create the English unit view in the eu schema.


SET search_path = eu;

CREATE VIEW eu.widgets AS
SELECT widget_nbr,
eu.get_lbs_from_kg(widgets.mass) AS weight,
eu.get_feet_from_meter(widgets.length) AS length,
eu.get_feet_from_meter(widgets.width) AS width,
eu.get_feet_from_meter(widgets.height) AS height
FROM si.widgets;


Now lets throw on a few "instead of rules" to make the view updatable:


CREATE RULE insert_to_si_widgets AS
ON INSERT TO eu.widgets DO INSTEAD
INSERT INTO si.widgets (widget_nbr, mass, length, width, height)
VALUES (new.widget_nbr,
eu.get_kg_from_lbs(new.weight),
eu.get_meter_from_feet(new.length),
eu.get_meter_from_feet(new.width),
eu.get_meter_from_feet(new.height));



CREATE OR REPLACE RULE update_to_si_widgets AS
ON UPDATE TO eu.widgets DO INSTEAD
UPDATE si.widgets
SET widget_nbr = new.widget_nbr,
mass = eu.get_kg_from_lbs(new.weight),
length = eu.get_meter_from_feet(new.length),
width = eu.get_meter_from_feet(new.width),
height = eu.get_meter_from_feet(new.height)
WHERE widget_nbr = old.widget_nbr;



CREATE OR REPLACE RULE delete_from_si_widgets AS
ON DELETE TO eu.widgets DO INSTEAD
DELETE FROM si.widgets

WHERE widget_nbr = old.widget_nbr;




After adding some test data using the view, we can see representations of the data in both unit systems:


SELECT widget_nbr, round( mass, 4) AS mass,
round( length, 4 ) as length, 'SI' AS UNITS
FROM si.Widgets
UNION ALL
SELECT widget_nbr, round(weight, 4) as mass,
round( length, 4 ) as length, 'EU' AS UNITS
FROM eu.Widgets;

widget_nbr | mass | length | units
------------+-----------+---------+-------
1 | 1.8144 | 0.3048 | SI
2 | 127.0059 | 0.3048 | SI
3 | 680.3886 | 3.0480 | SI
1 | 4.0000 | 1.0000 | EU
2 | 280.0000 | 1.0000 | EU
3 | 1500.0000 | 10.0000 | EU
(6 rows)



There is rounding error during the conversion, but the error is small and manageable according the requirements of this illustrative example.

At this point we have a useful working example, but what do we do for those that don't want to present everything in primary units such as lbs or Kilograms?

We'll see later on.

Friday, February 13, 2009

EAV - The Good and Bad

The Entity-Attribute-Value data model (commonly referred to as EAV), is a model used when dealing with Entity or Relationship type specific attributes and associated values.

I've often read about the principle in application development to "encapsulate what varies". Essentially the desire is to find software solutions that are able to handle changes in tht application's requirements without the need for frequent code modification. When the need arises to change the attributes persisted with an entity, the EAV data model encapsulates these changes exceedingly well.

While application developers like this quality some DBA's do not. Since part of the database's schema design is now abstracted away into the EAV tables, the application users have the power to choose attributes and values that will be used at will. While this can be a good thing, there are several problems that arise from this flexibility.
  1. Application users will not always agree on a naming convention for attributes. Duplicated Attributes with similar names appear over time. This is highly undesirable since the users are essentially corrupting the data model's conceptual schema with bad attribute names.
  2. There is no way to implement checks or foreign keys in the RDBMS tier for the attributes' value field to prevent invalid data.
  3. Users are able to assign invalid attributes to an entity type. Alternatively there is no way to ensure that the users will add all valid attributes that an entity type should have.
  4. As EAV tables grow in size they become SELECT statement performance killers.

Unfortunately for the DBA, the DBA is usually given the tedious task of correcting the above mentioned problems. And while the DBA may be successful in correcting some of these problems, he or she does not have the controls necessary to prevent these problems from reoccurring in the future.

So while the EAV data model is an attractive tool that application developers can use, consideration should be given to this model's weaknesses. Mitigating these weaknesses in the application tier will make the application more robust and will make the task of maintenance easier for the DBA.