Details

Virtuso Data Space Bot
Burlington, United States

Subscribe

Post Categories

Recent Articles

Display Settings

articles per page.
order.
Showing posts in all categories RefreshRefresh
Compare & Contrast: SQL Server's Linked Server vs Virtuoso's Virtual Database Layer

Microsoft SQL Server's Linked Server Promise

The ability to use distributed queries -- i.e., to issue SQL queries against any OLE-DB-accessible back end -- via Linked Servers.

The promise fails to materialize, primarily because while there are several ways of issuing such distributed queries, none of them work with all data access providers, and even for those that do, results received via different methods may differ.

Compounding the issue, there are specific configuration options which must be set correctly, often differing from defaults, to permit such things as "ad-hoc distributed queries".

Common tools that are typically used with such Linked Servers include SSIS and DTS. Such generic tools typically rely on four-part naming for their queries, expecting SQL Server to properly rewrite remotely executed queries for the DBMS engine which ultimately executes them.

The most common cause of failure is that when SQL Server rewrites a query, it typically does so using SQL-92 syntax, regardless of the back-end's abilities, and using the Transact-SQL dialect for implementation-specific query syntaxes, regardless of the back-end's dialect. This leads to problems especially when the Linked Server is an older variant which doesn't support SQL-92 (e.g., Progress 8.x or earlier, Informix 7 or earlier), or which SQL dialect differs substantially from Transact-SQL (e.g., Informix, Progress, MySQL, etc.).

Basic Four-Part Naming

SELECT *FROMlinked_server.[catalog].[schema].object

Four-part naming presumes that you have pre-defined a Linked Server, and executes the query on SQL Server. SQL Server decides what if any sub- or partial-queries to execute on the linked server, tends not to use appropriate syntax for these, and usually does not take advantage of linked server or provider features.

OpenQuery

SELECT * FROMOPENQUERY ( linked_server , 'query' )

OpenQuery? also presumes that you have pre-defined a Linked Server, but executes the query as a "pass-through", handing it directly to the remote provider. Features of the remote server and the data access provider may be taken advantage of, but only if the query author knows about them.

From the product docs:

SQL Server's Linked Server extension executes the specified pass-through query on the specified linked server. This server is an OLE DB data source. OPENQUERY can be referenced in the FROM clause of a query as if it were a table name. OPENQUERY can also be referenced as the target table of an INSERT, UPDATE, or DELETE statement. This is subject to the capabilities of the OLE DB provider. Although the query may return multiple result sets, OPENQUERY returns only the first one....OPENQUERY does not accept variables for its arguments.OPENQUERY cannot be used to execute extended stored procedures on a linked server. However, an extended stored procedure can be executed on a linked server by using a four-part name.OpenRowset

SELECT * FROMOPENROWSET( 'provider_name' , 'datasource' ; 'user_id' ; 'password', { [ catalog. ] [ schema. ] object | 'query' })

OpenRowset? does not require a pre-defined Linked Server, but does require the user to know what data access providers are available on the SQL Server host, and how to manually construct a valid connection string for the chosen provider. It does permit both "pass-through" and "local execution" queries, which can lead to confusion when the results differ (as they regularly will).

More from product docs:

Includes all connection information that is required to access remote data from an OLE DB data source. This method is an alternative to accessing tables in a linked server and is a one-time, ad hoc method of connecting and accessing remote data by using OLE DB. For more frequent references to OLE DB data sources, use linked servers instead. For more information, see Linking Servers. The OPENROWSET function can be referenced in the FROM clause of a query as if it were a table name. The OPENROWSET function can also be referenced as the target table of an INSERT, UPDATE, or DELETE statement, subject to the capabilities of the OLE DB provider. Although the query might return multiple result sets, OPENROWSET returns only the first one.

OPENROWSET also supports bulk operations through a built-in BULK provider that enables data from a file to be read and returned as a rowset.

...OPENROWSET can be used to access remote data from OLE DB data sources only when the DisallowAdhocAccess? registry option is explicitly set to 0 for the specified provider, and the Ad Hoc Distributed Queries advanced configuration option is enabled. When these options are not set, the default behavior does not allow for ad hoc access.When accessing remote OLE DB data sources, the login identity of trusted connections is not automatically delegated from the server on which the client is connected to the server that is being queried. Authentication delegation must be configured. For more information, see Configuring Linked Servers for Delegation.

Catalog and schema names are required if the OLE DB provider supports multiple catalogs and schemas in the specified data source. Values for catalog and schema can be omitted when the OLE DB provider does not support them. If the provider supports only schema names, a two-part name of the form schema.object must be specified. If the provider supports only catalog names, a three-part name of the form catalog.schema.object must be specified. Three-part names must be specified for pass-through queries that use the SQL Server Native Client OLE DB provider. For more information, see Transact-SQL Syntax Conventions (Transact-SQL).OPENROWSET does not accept variables for its arguments.OpenDataSource

SELECT * FROMOPENDATASOURCE( 'provider_name',   'provider_specific_datasource_specification').[catalog].[schema].object

As with basic four-part naming, OpenDataSource? executes the query on SQL Server. SQL Server decides what if any sub-queries to execute on the linked server, tends not to use appropriate syntax for these, and usually does not take advantage of linked server or provider features.

Additional doc excerpts

Provides ad hoc connection information as part of a four-part object name without using a linked server name.

...OPENDATASOURCE can be used to access remote data from OLE DB data sources only when the DisallowAdhocAccess? registry option is explicitly set to 0 for the specified provider, and the Ad Hoc Distributed Queries advanced configuration option is enabled. When these options are not set, the default behavior does not allow for ad hoc access.

The OPENDATASOURCE function can be used in the same Transact-SQL syntax locations as a linked-server name. Therefore, OPENDATASOURCE can be used as the first part of a four-part name that refers to a table or view name in a SELECT, INSERT, UPDATE, or DELETE statement, or to a remote stored procedure in an EXECUTE statement. When executing remote stored procedures, OPENDATASOURCE should refer to another instance of SQL Server. OPENDATASOURCE does not accept variables for its arguments.

Like the OPENROWSET function, OPENDATASOURCE should only reference OLE DB data sources that are accessed infrequently. Define a linked server for any data sources accessed more than several times. Neither OPENDATASOURCE nor OPENROWSET provide all the functionality of linked-server definitions, such as security management and the ability to query catalog information. All connection information, including passwords, must be provided every time that OPENDATASOURCE is called.

Virtuoso's Virtual Database Promise & Deliverables

The ability to link objects (tables, views, stored procedures) from any ODBC-accessible data source. This includes any JDBC-accessible data source, through the OpenLink ODBC Driver for JDBC Data Sources.

There are no limitations on the data types which can be queried or read, nor must the target DBMS have primary keys set on linked tables or views.

All linked objects may be used in single-site or distributed queries, and the user need not know anything about the actual data structure, including whether the objects being queried are remote or local to Virtuoso -- all objects are made to appear as part of a Virtuoso-local schema.

# PermaLink Comments [0]
01/31/2010 18:04 GMT-0500
Compare & Contrast: Oracle Heterogeneous Services (HSODBC, DG4ODBC) vs Virtuoso's Virtual Database Layer

Oracle Gateway Promise

Ability to use distributed queries over a generic connectivity gateway (HSODBC, DG4ODBC) -- i.e., to issue SQL queries against any ODBC- or OLE-DB-accessible linked back end.

Reality

Promise fails to materialize for several reasons. Immediate limitations include:

All tables locked by a FOR UPDATE clause and all tables with LONG columns selected by the query must be located in the same external database.

Distributed queries cannot select user-defined types or object REF datatypes on remote tables.In addition to the above, which apply to database-specific heterogeneous environments, the database-agnostic generic connectivity components have the following limitations:

  • A table including a BLOB column must have a separate column that serves as a primary key.
  • BLOB and CLOB data cannot be read by passthrough queries.
  • Updates or deletes that include unsupported functions within a WHERE clause are not allowed.
  • Generic Connectivity does not support stored procedures.
  • Generic Connectivity agents cannot participate in distributed transactions; they support single-site transactions only.
  • Generic Connectivity does not support multithreaded agents.
  • Updating LONG columns with bind variables is not supported.
  • Generic Connectivity does not support rowids.

Compounding the issue, the HSODBC and DG4ODBC generic connectivity agents perform many of their functions by brute-force methods. Rather than interrogating the data access provider (whether ODBC or OLE DB) or DBMS to which they are connected, to learn their capabilities, many things are done by using the lowest possible function.

For instance, when a SELECT COUNT (*) FROM table@link is issued through Oracle SQL, the target DBMS doesn't simply performs a SELECT COUNT (*) FROM table -- rather, it performs a SELECT * FROM table which is used to inventory all columns in the table, and then performs and fully retrieves SELECT field FROM table into an internal temporary table, where it does the COUNT(*) itself, locally. Testing has confirmed this process to be the case despite Oracle documentation stating that target data sources must support COUNT (*) (among other functions).

Virtuoso's Virtual Database Comparison

The Virtuoso Universal Server will link/attach objects (tables, views, stored procedures) from any ODBC-accessible data source. This includes any JDBC-accessible data source, through the OpenLink ODBC Driver for JDBC Data Sources.

There are no limitations on the data types which can be queried or read, nor must the target DBMS have primary keys set on linked tables or views.

All linked objects may be used in single-site or distributed queries, and the user need not know anything about the actual data structure, including whether the objects being queried are remote or local to Virtuoso -- all objects are made to appear as part of a Virtuoso-local schema.

# PermaLink Comments [0]
01/31/2010 18:03 GMT-0500
Linked Data and Virtuoso in 2010

It is again time for the end-of-year blog post.

In 2009, RDF scalability questions were solved in their broad outline and the corresponding Virtuoso release was built and used in production internally. Its general availability is now imminent while it has been available on a case-by-case basis thus far.

In 2010, we take on a new challenge: To bring RDF closer to parity with equivalent relational solutions. This will also entail some significant improvements to our relational technology.

Storage density is a key ingredient of performance. Some of the advances will be in this area; other advances will be in increased parallelism of execution. Right now we run things in vectored batches in cluster situations where message latency forces operations to be shipped in large chunks. Next we will do this across the board, also in single servers. The advantages of this for cache behavior and other factors are known in the literature.

Looking at environmental factors, we have a new SPARQL at a Working Draft stage. We have basic parity with SQL expressivity, which is a prerequisite for RDF to become a data model that can be an alternative to relational outside of very specialized contexts.

As the standards process makes SPARQL closer to being an alternative to SQL for data integration, we will make the database engine technology such that RDF's inherent penalty in terms of storage overhead and processing time substantially decreases. This will make RDF a workable integration medium also in places where it was not such before. Of course, an application-specific schema will retain some advantage over a generic one, but then one can have a purely relational application on Virtuoso as well. Just think of the possibility of an application-specific schema emerging by itself in a workload-driven fashion.

As background data for an increasing number of fields becomes available as linked data, using this together with proprietary data for analytics and discovery becomes increasingly interesting. This is the initial line of RDF data warehousing. The biomedical field has many examples. The technologies we will release during 2010 will be geared towards enabling a second line of RDF applications, where ad hoc agile integration with RDF as a lingua franca becomes a real alternative to relational solutions with ETL point solutions for harvesting information from diverse systems. One may see how RDF's flexibility and expressivity may add to agility in any number of situations where data from heterogenous sources needs to be integrated. Which of today's business scenarios does not face this issue?

References:

# PermaLink Comments [0]
12/29/2009 10:24 GMT-0500 Modified: 02/01/2010 09:14 GMT-0500
RDF Geography With Virtuoso

We have just added a geometry data type and corresponding R-tree index to Virtuoso. This follows the general scheme of SQL/MM, as is implemented by PostGIS and many others. We have all the engine-side stuff, including optimizer support for geometry cardinality sampling and good execution plans for combinations of spatial and other joins. We have however not yet implemented all the different geometry types and library function support for them, like shortest distance between two arbitrary shapes.

The geometry support is for both SQL and SPARQL. On the SQL side, it works with the ISO/IEC 13249 SQL/MM API; with RDF, a geometry can occur as the object of a quad. If the object is a typed-literal of the virtrdf:Geometry type, it gets indexed in a geometry index over all geometries in quads; no special declarations are needed. After this, SQL MM predicates and functions can be used with SPARQL, like this:

  PREFIX  geo:  <http://www.w3.org/2003/01/geo/wgs84_pos#>  
  SELECT  ?class
          COUNT (*) 
   WHERE  { ?m  geo:geometry  ?geo    . 
            ?m  a             ?class  . 
                FILTER ( <bif:st_intersects> 
                          ( ?geo, 
                            <bif:st_point> (0, 52), 
                            100
                          )
                       )
          } 
GROUP BY  ?class 
ORDER BY  DESC 2 
 

This returns the counts of objects of each class occurring within 100 km of (0, 52), a point near London.

For any data set with WGS 84 geo:long and geo:lat values, a simple SQL function makes a point geometry for each such coordinate pair and adds it as the geo:geometry property of the subject with the long/lat. This then enables fast spatial access to arbitrary location data in RDF.

Right now, we hardly see any geometries other than points in RDF data, even though there are some efforts for vocabularies for more complex entities. As these get adopted we will support them.

For scalability, we tried the implementation with OpenStreetMap's 350 million or so points. The geometry implementation partitions well over a cluster, similarly to a full text index, i.e., every server has its slice of the geometries, partitioned by the geometry object's key, thus not by range of coordinates or such. Like this, the items are evenly spread even though the coordinate distribution is highly uneven.

We can do spatial joins like —

   SELECT  ?s 
           ( <sql:num_or_null> (?p) )  
           COUNT (*) 
    WHERE  { ?s   <http://dbpedia.org/ontology/populationTotal>  ?p    . 
             FILTER 
               ( <sql:num_or_null> (?p) > 1000000 )                      . 
             ?s   geo:geometry                                   ?geo  .
             FILTER 
               ( <bif:st_intersects> ( ?pt, ?geo, 5 ) )                  . 
             ?xx  geo:geometry                                   ?pt 
           } 
 GROUP BY  ?s 
           ( <sql:num_or_null> (?p) )
 ORDER BY  DESC 3 
    LIMIT  20  

This takes the DBpedia subjects that have a population over 1 million and a geometry. We then count all the geometries within 5 km of the point location of the first geometry. With DBpedia (about 5 million points), GeoNames (7 million points), and OpenStreetMap (350 million points), we get the result:

http://dbpedia.org/resource/Munich                        1356594    117280
http://dbpedia.org/resource/London                        7355400     81486
http://dbpedia.org/resource/Davao_City                    1363337     58640
http://dbpedia.org/resource/Belo_Horizonte                2412937     58640
http://dbpedia.org/resource/Chengde                       3610000     58640
http://dbpedia.org/resource/Hamburg                       1769117     51664
http://dbpedia.org/resource/San_Diego%2C_California       1266731     47685
http://dbpedia.org/resource/Bursa                         1562828     47685
http://dbpedia.org/resource/Port-au-Prince                1082800     47685
http://dbpedia.org/resource/Oakland_County%2C_Michigan    1194156     45636
http://dbpedia.org/resource/Sana%27a                      1747627     40923
http://dbpedia.org/resource/Milan                         1303437     40923
http://dbpedia.org/resource/Campinas                      1059420     40923
http://dbpedia.org/resource/Hohhot                        2580000     40923
http://dbpedia.org/resource/Brussels                      1031215     40923
http://dbpedia.org/resource/Bogra_District                2988567     40923
http://dbpedia.org/resource/Cort%C3%A9s_Department        1202510     40923
http://dbpedia.org/resource/Berlin                        3416300     35668
http://dbpedia.org/resource/New_York_City                 8274527     30810
http://dbpedia.org/resource/Los_Angeles%2C_California     3849378     25614
20 Rows. -- 1733 msec.
Cluster 8 nodes, 1 s. 358 m/s 1596 KB/s 664% cpu 2% read 16% clw threads 1r 0w 0i buffers 1124351 0 d 0 w 0 pfs

This takes 1.7 seconds on a Virtuoso Cluster configured with 8 processes on a single dual-Xeon 5520 box, running at about 664% CPU with warm cache. Fair enough for a first crack, this can obviously be optimized further. Still, the geo part of the processing is already as good as instantaneous.

We will shortly have the geography features installed on DBpedia and the other data sets we host. As these come online we will show more demo queries.

For more about SQL/MM, you can look to a couple of PDFs:

# PermaLink Comments [0]
11/11/2009 12:17 GMT-0500 Modified: 02/01/2010 09:14 GMT-0500
European Commission and the Data Overflow

The European Commission recently circulated a questionnaire to selected experts on what could be done for the future of big data.

Since the questionnaire is public, I am publishing my answers below.

  1. Data and data types

    1. What volumes of data are we dealing with today? What is the growth rate? Where can we expect to be in 2015?

      Private data warehouses of corporations have more than doubled yearly for the past years; hundreds of TB is not exceptional. This will continue. The real shift is in structured data being published in increasing quantities with a minimum level of integrate-ability through use of RDF and linked data principles. There are rewards for use of standard vocabularies and identifiers through search engines recognizing such data. There is convergence around DBpedia identifiers for real-world entities, e.g., most things that would be in the news.

      This also means that internal data processes and silos may be enriched with this content. There is consequent pressure for accommodating more diversity of data, with more flexible schema.

      Ultimately, all content presently stored in RDBs and presented in public accessible dynamic web pages will end up on the web of linked data. Examples are product catalogs, price lists, event schedules and the like.

      The volume of the well known linked data sets is around 10 billion statements. With the above mentioned trends, growth by two or three orders of magnitude by 2015 seems reasonable, This is so especially if explicit semantics are extracted from the document web and if there is some further progress in the precision/recall of such extraction.

      Relevant sections of this mass of data are a potential addition to any present or future analytics application.

      Since arbitrary analytics over the database which is the web cannot be economically provided by a centralized search engine, a cloud model may be used for on-demand selection of relevant data and mixing it with private data. This will drive database innovation for the next years even more than the continued classical warehouse growth.

      Science data is another driver of the data overflow. For example, faster gene sequencing, more accurate measurements in high energy physics, better imaging, and remote sensing will produce large volumes of data. This data has highly regular structure but labeling this data with source and lineage calls for a flexible, schema-last, self-describing model, such as RDF and linked data. Data and metadata should travel together but may have different data models.

      By and large, the metadata of science data will be another stream to the web of linked data, at least to the degree it is publicly accessible. Restricted circles can and likely will implement similar ideas.

    2. What types of data can we deal with intelligently due to their inherent structure (geospatial, temporal, social or knowledge graphs, 3D, sensor streams...)?

      All the above types should be supported inside one DBMS so as to allow efficient querying combining conditions on all these types of data, e.g., photos of sunsets taken last summer in Ibiza, with over 20 megapixels, by people I know.

      Note that the test for being a sunset is an operation on the image blob that should be taken to the data; the images cannot be economically transferred.

      Interleaving of all database functions and types becomes increasingly important.

  2. Industries, communities

    1. Who is producing these data and why? Could they do it better? How?

      Right now, projects such as Bio2RDF, Neurocommons, and DBPedia produce this data. The processes are in place and are reasonable. Incremental improvement is to be expected. These processes, along with the linked data meme generally taking off, drive demand for better NLP (Natural Language Processing), e.g., entity and relationship extraction, especially extraction that can produce instance data in given ontologies (e.g., events) using common identifiers (e.g., DBPedia URIs).

      Mapping of RDBs to RDF is possible, and a W3C working group is developing standards for this. The required baseline level has been reached; the rest is a matter of automating deployment. Within the enterprise, there are advantages to be gained for information integration; e.g., all entities in the CRM space can be integrated with all email and support tickets through giving everything a URI. Some of this information may even be published on an extranet for self-service and web-service interfaces. This has been done at small scales and the rest is a matter of spreading adoption and lowering the entry barrier. Incremental progress will take place, eventually resulting in qualitatively better integration along the value chain when adoption is sufficiently widespread.

    2. Who is consuming these data and why? Could they do it better? How?

      Consumers are various. The greatest need is for tools that summarize complex data and allow getting a bird's eye view of what data is in the first instance available. Consuming the data is hindered by the user not even necessarily knowing what data there is. This is somewhat new, as traditionally the business analyst did know the schema of the warehouse and was proficient with SQL report generators and statistics packages.

      Where Web 2.0 made the citizen journalist, the web of linked data will make the citizen analyst. For this to happen, with benefits for individuals, enterprises, and governments alike, more work in user interfaces, knowledge discovery, and query composition will be useful. We may envision a "meshup economy" where data is plentiful, but the unit of value and exchange is the smart report that crystallizes actionable value from this ocean.

    3. What industrial sectors in Europe could become more competitive if they became much better at managing data?

      Any sector could benefit. Early adopters are seen in the biomedical field and to an extent in media.

    4. Is the regulation landscape imposing constraints (privacy, compliance ...) that don't have today good tool support?

      The regulation landscape drives database demand through data retention requirements and the like.

      With data integration, especially with privacy-sensitive data (as in medicine), there are issues of whether one dares put otherwise-shareable information online. Regulation is needed to protect individuals, but integration should still be possible for science.

      For this, we see a need for progress in applying policy-based approaches (e.g., row level security) to relatively schema-last data such as RDF. This is possible but needs some more work. Also, creating on-the-fly-anonymizing views on data might help.

      More research is needed for reconciling the need for security with the advantages of broad-based ad hoc integration. Ideally, data should be intelligent, aware of its origins and classification and cautious of whom it interacts with, all of this supported under the covers so that the user could ask anything but the data might refuse to answer or might restrict answers according to the user's profile. This is a tall order and implementing something of the sort is an open question.

    5. What are the main practical problem identified for individuals and organizations? Please give examples and tell us about the main obstacles and barriers.

      We have come across the following:

      • Knowing that the data exists in the first place.
      • If the data is found, figuring out the provenance, units and precision of measurement, identifiers, and the like.
      • Compatible subject matter but incompatible representation: For example, one has numbers on a map with different maps for different points in time; another has time series of instrument data with geo-location for the instrument. It is only to be expected that the time interval between measurements is not the same. So there is need for a lot of one-off programming to align data.

      Other problems have to do with sheer volume, i.e., transfer of data even in a local area network is too slow, let alone over a wide area network. Computation needs to go to the data, and databases need to support this.

  3. Services, software stacks, protocols, standards, benchmarks

    1. What combinations of components are needed to deal with these problems?

      Recent times have seen a proliferation of special purpose databases. Since the data needs of the future are about combining data with maximum agility and minimum performance hit, there is need to gather the currently-separate functionality into an integrated system with sufficient flexibility. We see some of this in integration of map-reduce and scale-out databases. The former antagonists have become partners. Vertica, Greenplum, and OpenLink Virtuoso are example of DBMS featuring work in this direction.

      Interoperability and at least de facto standards in ways of doing this will emerge.

    2. What data exchange and processing mechanisms will be needed to work across platforms and programming languages?

      HTTP, XML, and RDF are in fact very verbose, yet these are the formats and models that have uptake. Thus, these will continue to be used even though one might think binary formats to be more efficient.

      There are of course science data set standards that are more compressed and these will continue, hopefully adding a practice of rich metadata in RDF.

      For internals of systems, MPI and TCP/IP with proprietary optimized wire formats will continue. Inter-system communication will likely continue to be HTTP, XML, and RDF as appropriate.

    3. What data environments are today so wastefully messy that they would benefit from the development of standards?

      RDF and OWL are not messy but they could use some more performance; we are working on this. SPARQL is finally acquiring the capabilities of a serious query language, so things are slowly coming together.

      Community process for developing application domain specific vocabularies works quite well, even though one could argue it is ad hoc and not up to what a modeling purist might wish.

      Top-down imposition of standards has a mixed history, with long and expensive development and sometimes no or little uptake, consider some WS* standards for example.

    4. What kind of performance is expected or required of these systems? Who will measure it reliably? How?

      Relational databases have a history of substantial investment in optimization and some of them are very good for what they do, e.g., the newer generation of analytics databases.

      The very large schema-last, no-SQL, sometimes eventually consistent key-value stores have a somewhat shorter history but do fill a real need.

      These trends will merge: Extreme scale, schema-last, complex queries, even more complex inference, custom code for in-database machine learning and other bulk processing.

      We find RDF augmented with some binary types at this crossroads. This point of the design space will have to provide performance roughly on the level of today's best relational solution for workloads that fit the relational model. The added cost of schema-last and inference must come down. We are working on this. Research work such as carried out with MonetDB gives clues as to how these aims can be reached.

      The separation of query language and inference is artificial. After the concepts are mature, these functions will merge and execute close to the data; there are clear evolutionary pressures in this direction.

      Benchmarks are key. Some gain can be had even from repurposing standard relational benchmarks like TPC-H. But the TPC-H rules do not allow official reporting of such.

      Development of benchmarks for RDF, complex queries, and inference is needed. A bold challenge to the community, it should be rooted in real-life integration needs and involve high heterogeneity. A key-value store benchmark might also be conceived. A transaction benchmark like TPC-C might be the basis, maybe augmented with massive user-generated content like reviews and blogs.

      If benchmarks exist and are not too easy nor inaccessibly difficult nor too expensive to run — think of the high end TPC-C results — then TPC-style rules and processes would be quite adequate. The threshold to publish should be lowered: Everybody runs the TPC workloads internally but few publish.

      Some EC initiative for benchmarking could make sense, similar to the TREC initiative of the US government. Industry should be consulted for the specific content; possibly the answers to the present questionnaire can provide an approximate direction.

      Benchmarks should be run by software vendors on their own systems, tuned by themselves. But there should be a process of disclosure and auditing; the TPC rules give an example. Compliance should not be too expensive or time consuming. Some community development for automating these things would be a worthwhile target for EC funding.

  4. Usability and training

    1. How difficult will it be for a developer of average competence to deploy components whose core is based on rather deep computer science? Do we all need to understand Monads and Continuations? What can be done to make it ever easier?

      In the database world, huge advances in technology have taken place behind a relatively simple and stable interface: SQL. For the linked data web, the same will take place behind SPARQL.

      Beyond these, for example, programming with MPI with good utilization of a cluster platform for an arbitrary algorithm, is quite difficult. The casual amateur is hereby warned.

      There is no single solution. For automatic parallelization, since explicit, programmatic parallelization of things with MPI for example is very unscalable in terms of required skill, we should favor declarative and/or functional approaches.

      Developing a debugger and explanation engine for rule-based and description-logics-based inference would be an idea.

      For procedural workloads, things like Erlang may be good in cases and are not overly difficult in principle, especially if there are good debugging facilities.

      For shipping functions in a cluster or cloud, the BOOM (Berkeley Orders Of Magnitude) approach or logic programming with explicit specification of compute location seem promising, surely more flexible than map-reduce. The question is whether a PHP developer can be made to do logic programming.

      This bridge will be crossed only with actual need and even then reluctantly. We may look at the Web 2.0 practice of sharding MySQL, inconvenient as this may be, for an example. There is inertia and thus re-architecting is a constant process that is generally in reaction to facts, post hoc, often a point solution. One could argue that planning ahead would be smarter but by and large the world does not work so.

      One part of the answer is an infinitely-scalable SQL database that expands and shrinks in the clouds, with the usual semantics, maybe optional eventual consistency and built-in map reduce. If such a thing is inexpensive enough and syntax-level-compatible with present installed base, many developers do not have to learn very much more.

      This is maybe good for the bread-and-butter IT, but European competitiveness should not rest on this. Therefore we wish to go for bold new application types for which the client-server database application is not the model. Data-centric languages like BOOM, if they can be made very efficient and have good debugging support, are attractive there. These do require more intellectual investment but that is not a problem since the less-inquisitive part of the developer community is served by the first part of the answer.

    2. How is a developer of average skills going to learn about these new advanced tools? How can we plan for excellent documentation and training, community mentoring, exchange of good practices, etc... across all EU countries?

      For the most part, developers do not learn things for the sake of learning. When they have learned something and it is adequate, they stay with it for the most part and are even reluctant to engage in cross-camps interaction. The research world is often similarly insular. A new inflection in the application landscape is needed to drive learning. This inflection is provided by the ubiquity of mobile devices, sensor data, explicit semantics, NLP concept extraction, web of linked data, and such factors.

      RDFa is a good example of a new technique piggybacking on something everybody uses, namely HTML. These new things should, within possibility, be deployed in the usual technology stack, LAMP or Java. Of course these do not have to be LAMP or Java or HTML or HTTP themselves but they must manifest through these.

      A lot of the semantic web potential can be realized within the client-server database application model, thus no fundamental re-architecting, just some new data types and queries.

      For data- or processing-intensive tasks, an on-demand hookup to cloud-based servers with Erlang and/or BOOM for programming model would be easy enough to learn and utilize.

      The question is one of providing challenges. Addressing actual challenges with these techniques will lead to maturity, documentation, examples, and training. With virtual, Europe-wide distributed teams a reality in many places, Europe-wide dissemination is no longer insurmountable.

      As the data overflow proceeds, its victims will multiply and create demand for solutions. The EC could here encourage research project use cases gaining an extended life past the end of research projects, possibly being maintained and multiplied and spun off.

      If such things could be mutated into self-sustaining service businesses with pay-per-use revenue, say through a cloud SaaS business model, still primarily leveraging an open source technology stack, we could have self-propagating and self-supporting models for exploiting advanced IT. This would create interest, and interest would drive training and dissemination.

      The problem is creating the pull.

  5. Challenges

    1. What should be, in this domain, the equivalent of the Netflix challenge, Ansari X Prize, Google Lunar X Prize, etc. ... ?

      The EC itself no doubt suffers from data overflow in one function or another. Unless security/secrecy prohibits, simply publishing a large data set and a description of what operations should be done on it would be a start. The more real the data, the better — reality is consistently more complex and surprising than imagination. Since many interesting problems touch on fraud detection and law enforcement, there may be some security obstacles for using these application domains as subject matters of open challenges.

      Once there is a good benchmark, as discussed above, there can be some prize money allocated for the winners, specially if the race is tight.

      The Semantic Web Challenge and the Billion Triples Challenge exist and are useful as such, but do not seem to have any huge impact.

      The incentives should be sufficient and part of the expenses arising from running for such challenges could be funded. Otherwise investing in existing business development will be more interesting to industry. Some industry participation seems necessary; we would wish academia and industry to work closer. Also, having industry supply the baseline guarantees that academia actually does further the state of the art. This is not always certain.

      If challenges are based on actual problems, whether of the EC, its member governments, or private entities, and winning the challenge may lead to a contract for supplying an actual solution, these will naturally become more interesting for consortia involving integrators, specialist software vendors, and academia. Such a model would build actual capacity to deploy leading edge technologies in production, which is sorely needed.

    2. What should one do to set up such a challenge, administer, and monitor it?

      The EC should probably circulate a call for actual problem scenarios involving big data. If the matter of the overflow is as dire as represented, cases should be easy to find. A few should be selected and then anonymized if needed.

      The party with the use case would benefit by having hopefully the best work on it. The contestants would benefit from having real world needs guide R&D. The EC would not have to do very much, except possibly use some money for funding the best proposals. The winner would possibly get a large account and related sales and service income. The contestants would have to be teams possibly involving many organizations; for example, development and first-line services and support could come from different companies along a systems integrator model such as is widely used in the US.

      There may be a good benchmark at the time, possibly resulting from FP7 itself. In such a case, the EC could offer a prize for winners. Details would have to be worked out case by case. Such a challenge could be repeated a few times, as benchmark-driven progress in databases or TREC for example have taken some years to reach a point of slowdown in progress.

      Administrating such an activity should not be prohibitive, as most of the expertise can be found with the stakeholders.

# PermaLink Comments [0]
10/27/2009 13:29 GMT-0500 Modified: 10/27/2009 14:57 GMT-0500
VLDB 2009 Web Scale Data Management Panel (5 of 5)

"The universe of cycles is not exactly one of literal cycles, but rather one of spirals," mused Joe Hellerstein of UC Berkeley.

"Come on, let's all drop some ACID," interjected another.

"It is not that we end up repeating the exact same things, rather even if some patterns seem to repeat, they do so at a higher level, enhanced by the experience gained," continued Joe.

Thus did the Web Scale Data Management panel conclude.

Whether successive generations are made wiser by the ones that have gone before may be argued either way.

The cycle in question was that of developers discovering ACID in the 1960s, i.e. Atomicity, Consistency, Integrity, Durability. Thus did the DBMS come into being. Then DBMSs kept becoming more complex until, as there will be a counter-force to each force, came the meme of key value stores and BASE, no multiple-row transactions, eventual consistency, no query language but scaling to thousands of computers. So now, the DBMS community asks itself what went wrong.

In the words of one panelist, another demonstrated a "shocking familiarity with the subject matter of substance abuse" when he called for the DBMS community to get on a 12 step program and to look where addiction to certain ideas, among which ACID, had brought its life. Look at yourself: The influential papers in what ought to be your space by rights are coming from the OS community: Google Bigtable, Amazon Dynamo, want more? When you ought to drive, you give excuses and play catch up! Stop denial, drop SQL, drop ACID!

The web developers have revolted against the time-honored principles of the DBMS. This is true. Sharded MySQL is not the ticket — or is it? Must they rediscover the virtues of ACID, just like the previous generation did?

Nothing under the sun is new. As in music and fashion, trends keep cycling also in science and engineering.

But seriously, does the full-featured DBMS scale to web scale? Microsoft says the Azure version of SQL server does. Yahoo says they want no SQL but Hadoop and PNUTS.

Twitter, Facebook, and other web names got their own discussion. Why do they not go to serious DBMS vendors for their data but make their own, like Facebook with Hive?

Who can divine the mind of the web developer? What makes them go to memcached, manually sharded MySQL, and MapReduce, walking away from the 40 years of technology invested in declarative query and ACID? What is this highly visible but hard to grasp entity? My guess is that they want something they can understand, at least at the beginning. A DBMS, especially on a cluster, is complicated, and it is not so easy to say how it works and how its performance is determined. The big brands, if deployed on a thousand PCs, would also be prohibitively expensive. But if all you do with the DBMS is single row selects and updates, it is no longer so scary, but you end up doing all the distributed things in a middle layer, and abandoning expressive queries, transactions, and database-supported transparency of location. But at least now you know how it works and what it is good/not good for.

This would be the case for those who make a conscious choice. But by and large the choice is not deliberate; it is something one drifts into: The application gains popularity; the single LAMP can no longer keep all in memory; you need a second MySQL in the LAMP and you decide that users A–M go left and N–Z right (horizontal partitioning). This siren of sharding beckons you and all is good until you hit the reef of re-architecting. Memcached and duct-tape help, like aspirin helps with hangover, but the root cause of the headache lies unaddressed.

The conclusion was that there ought to be something incrementally scalable from the get-go. Low cost of entry and built-in scale-out. No, the web developers do not hate SQL; they just have gotten the idea that it does not scale. But they would really wish it to. So, DBMS people, show there is life in you yet.

Joe Hellerstein was the philosopher and paradigmatician of the panel. His team had developed a protocol-compatible Hadoop in a few months using a declarative logic programming style approach. His claim was that developers made the market. Thus, for writing applications against web scale data, there would have to be data centric languages. Why not? These are discussed in Berkeley Orders Of Magnitude (BOOM).

I come from Lisp myself, way back. I have since abandoned any desire to tell anybody what they ought to program in. This is a bit like religion: Attempting to impose or legislate or ram it on somebody just results in anything from lip service to rejection to war. The appeal exerted by the diverse language/paradigm -isms on their followers seems to be based on hitting a simplification of reality that coincides with a problem in the air. MapReduce is an example of this. PHP is another. A quick fix for a present need: Scripting web servers (PHP) or processing tons of files (MapReduce). The full database is not as quick a fix, even though it has many desirable features. It is also not as easy to tell what happens inside one, so MapReduce may give a greater feeling of control.

Totally self-managing, dynamically-scalable RDF would be a fix for not having to design or administer databases: Since it would be indexed on everything, complex queries would be possible; no full database scans would stop everything. For the mid-size segment of web sites this might be a fit. For the extreme ends of the spectrum, the choice is likely something custom built and much less expressive.

The BOOM rule language for data-centric programming would be something very easy for us to implement, in fact we will get something of the sort essentially for free when we do the rule support already planned.

The question is, can one induce web developers to do logic? The history is one of procedures, both in LAMP and MapReduce. On the other hand, the query languages that were ever universally adopted were declarative, i.e., keyword search and SQL. There certainly is a quest for an application model for the cloud space beyond just migrating apps. We'll see. More on this another time.

# PermaLink Comments [0]
09/01/2009 12:24 GMT-0500 Modified: 09/02/2009 12:05 GMT-0500
VLDB 2009 Yahoo Keynote (4 of 5)

Raghu Ramakrishnan of Yahoo! gave a keynote about PNUTS, the Yahoo solution for managing massive user data, from front page preferences to mail to social networks.

Dynamic scale, wide area replication, and high availability are the issues. Transactions on multiple records, complex queries, and absolute consistency at all times are traded off. Also, the programming interfaces are lower level than with SQL. Replication and consistency rules are choices for the application developer; the platform offers some basic alternatives. Implementation-wise, there is a MySQL back-end and all the partitioning, query routing, replication, and balancing take place in a layer of front-ends.

Now what do we say to this?

In the Yahoo! case, even if complex queries were possible, which they are not, one would probably keep them off the online system since latency and availability are everything. A latency of some tens of milliseconds is however acceptable, which is not so terrible for single record operations: There is time for a couple of messages on the data center network and even maybe for a disk read.

PNUTS is probably the fastest way of getting to the desired beachhead of simple access to data at infinite scale in multiple geographies. In the identical situation, I might have done something similar.

But we are in a different situation, concerned with complex queries, a highly-normalized schema-last situation, i.e., index on everything, large objects normalized away, as is done in RDF. Then we are also in the relational situation. Infinite scale, fault tolerance, and wide-area replication do come up regularly in user needs. The applications for which people would like RDF are not only complex reasoning things but very big metadata stores for user generated content, social networks, and the like.

Which of the PNUTS principles could we apply?

  • Division in tablets: When a partition of the data grows too big, it should split.

  • Migration of partitions: as capacity/demand change, partitions should migrate so as to equalize load.

  • High availability: This is divided in two — on one hand inside the data center; on the other between data centers. Inside the data center, storing partitions in duplicate and running them synchronously is possible. This is manifestly impossible in wide area settings, though. For this, we need a log-shipping style of asynchronous replication. But how does one deal with split networks and transfer of replication mastery?

PNUTS determines the master copy record by record. This makes sense when the record, for example, corresponds to a user. For RDF, doing this by the triple would be prohibitive. Doing this by the graph, or by the subject of a set of triples across all graphs, would be better. We would agree with PNUTS that transferring mastery by the storage chunk is not desired, as the chunk will contain arbitrary unrelated data.

The eventual consistency mechanisms can be generalized to RDF readily enough. In a social RDF application, the graph is the most likely unit of data ownership and update authorization, so the graph would also be the unit of eventual consistency. Keeping a separate data structure listing recent inserts/deletes to a graph with timestamps would serve for establishing consistency. The size of this would be a small fraction of the size of the graph.

RDF cannot do anything without joining between partitions, whereas for PNUTS the join between partitions is an application matter. But then PNUTS does have an extra step of RPC between the PNUTS infrastructure and the back-end. Doing query routing in the back-end gets rid of this. RDF does remain more dependent on even performance and short interconnect latencies, though. It also likely takes more space. But the essential consistency and availability features can be generalized to it, providing the merge of semi-structured data at infinite scale and availability with complex query.

At any rate, repartitioning-on-demand and partition-migration remain the key agenda items for us, confirmed over and over at VLDB.

# PermaLink Comments [0]
09/01/2009 12:04 GMT-0500 Modified: 09/01/2009 17:32 GMT-0500
VLDB 2009 TPC Workshop (3 of 5)

Michael Stonebraker gave the keynote at the TPC workshop. His message was that the TPC, at the venerable age of 21, was already a decade late in reinventing itself. From the height of relevance at the time of the debit/credit benchmark twenty years back, it was slipping into the sunset of irrelevance unless it paid attention.

Now we are great fans of the TPC and while we have not published results by the TPC book, we have extensively used TPC material for guiding optimization, as has pretty much everybody else.

It is true that the rules encourage unrealistic configurations. The emphasis on random access from disk that is built into the rules leads to disk configurations that are very improbable in practice, such as 1PB of disks for 3TB of data, just so there are enough disk arms in parallel. Stonebraker also pointed out that replication and failover were ubiquitous in real life and that roll forward from logs was unrealistic as a recovery model since it took so long. Benchmarks should therefore include replication.

Further, Stonebraker challenged the TPC to go for the new frontier, which he described as the huge data sets in science and on big web sites. Scientists, the ones who would save our planet from the diverse ills confronting it, do not like relational databases. They avoid them when can. They want arrays for physics, and graphs for biology and chemistry. MapReduce is eating database's lunch; what will you do about this?

I later suggested incorporating an RDF metadata benchmark into the TPC suite. We'll see about this; we'll first have to come up with a suitable one. There is a great deal of pressure for making good RDF benchmarks but this is not yet in the center of the mainstream that TPC tends to cover.

TPC's own talk was about the life cycle of benchmarks. A benchmark begins a bit ahead of the mainstream, with a problem that is difficult but not so difficult as to be uncommon. When the solution to this problem becomes commonplace, the benchmark's relevance gradually drops.

There was a talk on robustness of query plans which was well to the point. Indeed, there are performance cliffs at certain points; for example, when passing from memory-only to disk-pageable data structures, or when switching from indexed access to table scans, or from loop to hash joins. Quite so. The analysis I really would have liked to see would have been one of what happens when passing from single server to a cluster, and from local joins to cross-partition ones. Also contrasting of cache fusion and partitioning. We have our own data and experience but we find we don't have time to measure all the other systems.

Anyway it is good to raise the question of smooth and predictable performance.

# PermaLink Comments [0]
09/01/2009 11:51 GMT-0500 Modified: 09/01/2009 17:32 GMT-0500
Some Interesting VLDB 2009 Papers (2 of 5)

Intel on Hash Join

Intel and Oracle had measured hash and sort merge joins on Intel Core i7. The result was that hash join with both tables partitioned to match CPU cache was still the best but that sort/merge would catch up with more SIMD instructions in the future.

We should probably experiment with this but the most important partitioning of hash joins is still between cluster nodes. Within the process, we will see. The tradeoff of doing all in cache-sized partitions is larger intermediate results which in turn will impact the working set of disk pages in RAM. For one-off queries this is OK; for online use this has an effect.

1000 TABLE Queries

SAP presented a paper about federating relational databases. Queries would be expressed against VIEWs defined over remote TABLEs, UNIONed together and so forth. Traditional methods of optimization would run out of memory; a single 1000 TABLE plan is already a big thing. Enumerating multiple variations of such is not possible in practice. So the solution was to plan in two stages — first arrange the subqueries and derived TABLEs, and then do the JOIN orders locally. Further, local JOIN orders could even be adjusted at run time based on the actual data. Nice.

Oracle Subqueries and New Implementation of LOBs

Oracle presented some new SQL optimizations, combining and inlining subqueries and derived TABLEs. We do fairly similar things and might extend the repertoire of tricks in the direction outlined by Oracle as and when the need presents itself. This further confirms that SQL and other query optimization is really an incremental collection of specially recognized patterns. We still have not found any other way of doing it.

Another interesting piece by Oracle was about their re-implementation of large object support, where they compared LOB loading to file system and raw device speeds.

Amadeus CRS booking system, steady query time for arbitrary single table queries

There was a paper about a memory-resident database that could give steady time for any kind of single-table scan query. The innovation was to not use indices, but to have one partition of the table per processor core, all in memory. Then each core would have exactly two cursors — one reading, the other writing. The write cursor should keep ahead of the read cursor. Like this, there would be no read/write contention on pages, no locking, no multiple threads splitting a tree at different points, none of the complexity of a multithreaded database engine. Then, when the cursor would hit a row, it would look at the set of queries or updates and add the result to the output if there was a result. The data indexes the queries, not the other way around. We have done something similar for detecting changes in a full text corpus but never thought of doing queries this way.

Well, we are all about JOINs so this is not for us, but it deserves a mention for being original and clever. And indeed, anything one can ask about a table will likely be served with great predictability.

Greenplum

Google's chief economist said that the winning career choice would be to pick a scarce skill that made value from something that was plentiful. For the 2010s this career is that of the statistician/data analyst. We've said it before — the next web is analytics for all. The Greenplum talk was divided between the Fox use case, with 200TB of data about ads, web site traffic, and other things, growing 5TB a day. The message was that cubes and drill down are passé, that it is about complex statistical methods that have to run in the database, that the new kind of geek is the data geek, whose vocation it is to consume and spit out data, discover things in it, and so forth.

The technical part was about Greenplum, a SQL database running on a cluster with a PostgreSQL back-end. The interesting points were embedding MapReduce into SQL, and using relational tables for arrays and complex data types — pretty much what we also do. Greenplum emphasized scale-out and found column orientation more like a nice-to-have.

MonetDB, optimizing database for CPU cache

The MonetDB people from CWI in Amsterdam gave a 10 year best paper award talk about optimizing database for CPU cache. The key point was that if data is stored as columns, it ought also to be transferred as columns inside the execution engine. Materialize big chunks of state to cut down on interpretation overhead and use cache to best effect. They vector for CPU cache; we vector for scale-out, since the only way to ship operations is to ship many at a time. So we might as well vector also in single servers. This could be worth an experiment. Also we regularly visit the topic of column storage. But we are not yet convinced that it would be better than row-style covering indices for RDF quads. But something could certainly be tried, given time.

# PermaLink Comments [0]
09/01/2009 11:46 GMT-0500 Modified: 09/01/2009 17:32 GMT-0500
VLDB 2009 (1 of 5)

I was at the VLDB 2009 conference in Lyon, France. I will in the next few posts discuss some of the prominent themes and how they relate to our products or to RDF and Linked Data.

Firstly, RDF was as good as absent from the presentations and discussions we saw. There were a few mentions in the panel on structured data on the web, however RDF was not in any way seen to be essential for this. There were also a couple of RDF mentions in questions at other sessions, but that was about it.

It is a common perception that RDF and database people do not talk with each other. Evidence seems to bear this out.

As a database developer I did get a lot of readily applicable ideas from the VLDB talks. These run across the whole range of DBMS topics, from key compression and SQL optimization, to column storage, CPU cache optimization, and the like. In this sense, VLDB is directly relevant to all we do. In a conversation, someone was mildly confused that I should on one hand mention I was doing RDF, and on the other hand also be concerned about database performance. These things are not seen to belong together, even though making RDF do something useful certainly depends on a great deal of database optimization.

The question of all questions — that of infinite scale-out with complex queries, resilience, replication, and full database semantics — was strongly in the air.

But it was in the air more as a question than as an answer. Not very much at all was said about the performance of distributed query plans, of 2pc (two-phase commit), of the impact of interconnect latency, and such things. On the other hand, people were talking quite liberally about optimizing CPU cache and local multi-core execution, not to mention SQL plans and rewrites. Also, almost nothing was said about transactions.

Still, there is bound to be a great deal of work in scale-out of complex workloads by any number of players. Either these things are all figured out and considered self-evidently trivial, or they are so hot that people will go there only by way of allusion and vague reference. I think it is the latter.

By and large, we were confirmed in our understanding that infinite scale-out on the go, with redundancy, is the ticket, especially if one can offer complex queries and transactional semantics coupled with instant data loading and schema-last.

Column storage and cache optimizations seem to come right after these.

Certainly the database space is diversifying.

MapReduce was discussed quite a bit, as an intruder into what would be the database turf. We have no great problem with MapReduce; we do that in SQL procedures if one likes to program in this way. Greenplum also seems to have come by the same idea.

As said before, RDF and RDF reasoning were ignored. Do these actually offer something to the database side? Certainly for search, discovery, integration, and resource discovery, linked data has evident advantages.

Two points of the design space — the warehouse, and the web-scale key-value store — got a lot of attention. Would I do either in RDF? RDF is a slightly different design space point, like key-value with complex queries — on the surface, a fusion of the two. As opposed to RDF, the relational warehouse gains from fixed data-types and task-specific layout, whether row or column. The key-value store gains from having a concept of a semi-structured record, a bit like the RDF subject of a triple, but now with ad-hoc (if any) secondary indices, and inline blobs. The latter is much simpler and more compact than the generic RDF subject with graphs and all, and can be easily treated as a unit of version control and replication mastering. RDF, being more generic and more normalized, is representationally neither as ad-hoc nor as compact.

But RDF will be the natural choice when complex queries and ad-hoc schema meet, for example in web-wide integrations of application data.

There seems to be a huge divide in understanding between database-developing people and those who would be using databases. On one side, this has led to a back-to-basics movement with no SQL, no ACID, key-value pairs instead of schema, MapReduce instead of fancy but hard-to-follow parallel execution plans. On the other side, the database space specializes more and more; it is no longer simply transactions vs. analytics, but many more points of specialization.

Some frustration can be sensed in the ivory towers of science when it is seen that the ones most in need of database understanding in fact have the least. Google, Yahoo!, and Microsoft know what they are doing, with or without SQL, but the medium-size or fast-growing web sites seem to be in confusion when LAMP or Ruby or the scripting-du-jour can no longer cut it.

Can somebody using a database be expected to understand how it works? I would say no, not in general. Can a database be expected to unerringly self-configure based on workload? Sure, a database can suggest layouts, but it ought not restructure itself on the spur of the moment under full load.

It is safe to say that the community at large no longer believes in "one size fits all". Since there is no general solution, there is a fragmented space of specific solutions. We will be looking at some of these issues in the following posts.

# PermaLink Comments [0]
09/01/2009 11:30 GMT-0500 Modified: 09/01/2009 16:53 GMT-0500
 <<     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform
OpenLink Software 1998-2006