Microsoft.NET

……………………………………………….Expertise in .NET Technologies

Archive for September, 2009

ADO.NET: Part 1 – Data Access – History

Posted by Ravi Varma Thumati on September 24, 2009

History

            The main goal of all APIs is to provide universal data access by means of having a single code base for accessing data from any source, from any language.

Having universal data access is important for four reasons:

  • Developers can easily work on applications targeting different data stores without needing to become experts on each one.  
  • Developers can have a common framework for data access when switching between programming languages, making the transition to new languages easier.
  • It enables developers to more easily write a single application that can be deployed against multiple data stores.
  • Finally, it provides a level of abstraction between the application and direct communication to the database to simplify the code the average developer needs to write. 

Universal Data Access

At first, there were no common interfaces for accessing data. Each data provider exposed an API or other means of accessing its data. The developer only had to be familiar with the API of the data provider used. When companies switched to a new database system, any knowledge of how to use the old system became worthless and the developer had to learn a new system from scratch. Something needed to be done to standardize the way in which data was retrieved from various sources.

ODBC

Open Database Connectivity (ODBC) helped address the problem of needing to know the details of each DBMS used. ODBC provides a single interface for accessing a number of database systems. To accomplish this, ODBC provides a driver model for accessing data. Any database provider can write a driver for ODBC to access data from their database system. This enables developers to access that database through the ODBC drivers instead of talking directly to the database system. For data sources such as files, the ODBC driver plays the role of the engine, providing direct access to the data source. In cases where the ODBC driver needs to connect to a database server, the ODBC driver typically acts as a wrapper around the API exposed by the database server.

ODBC was a huge leap forward and helped to greatly simplify database-driven application development. It does have some shortfalls, though.

  • First, it is only capable of supporting relational data. If you need to access a hierarchical data source such as LDAP, or semi-structured data, ODBC can’t help you.
  • Second, it can only handle SQL statements, and the result must be represent able in the form of rows and columns.

Overall, ODBC was a huge success, considering what the previous environment was like.

OLE-DB

Object Linking and Embedding Database (OLE-DB) was the next big step forward in data providers, and it is still widely used today. With OLE-DB, Microsoft applied the knowledge learned from developing ODBC to provide a better data access model. OLE-DB marked Microsoft’s move to a COM-based API, which made it easily consumable by most programming languages, and the migration to a 32-bit OS with the release of Windows 95.

OLE-DB Providers

OLE-DB is also much less dependent upon the physical structure of the database. It supports both relational and hierarchical data sources, and does not require the query against these data sources to follow a SQL structure. As with ODBC, vendors can create custom providers to expose access to their database system. Most people wouldn’t argue with the belief that it is far easier to write an OLE-DB provider than an ODBC driver. A provider needs to perform only four basic steps:

  1. Open the session.
  2. Process the command.
  3. Access the data.
  4. Prepare a rowset.

OLE-DB Consumers

The other half of the OLE-DB framework is the OLE-DB consumer. The consumer is the layer that speaks directly to the OLE-DB providers, and it performs the following steps:

  1. Identify the data source.
  2. Establish a session.
  3. Issue the command.
  4. Return a rowset.

Figure 1-1 shows how this relationship works.

 ADO.NET1 

Data Access Consumers

Developers who use languages that support pointerssuch as C, C++, VJ++, and so oncan speak directly to the ODBC and OLE-DB APIs. However, developers using a language such as Visual Basic need another layer. This is where the data access consumers such as DAO, RDO, ADO, and ADO.NET come into play.

DAO

With the release of Visual Basic 2.0, developers were introduced to a new method for accessing data, known as Data Access Objects (DAO). This was Microsoft’s first attempt to create a data consumer API.  

DAO was based on the JET engine, which was largely designed to help developers take advantage of the desktop database application Microsoft was about to release, Microsoft Access. It served to provide another layer of abstraction between the application and data access, making the developer’s task simpler.

With release of DAO 1.0, which supported direct communication with Microsoft Access databases without using ODBC.

 ADO.NET2

The main problem with DAO is that it can only talk to the JET engine. The JET engine then communicates with ODBC to retrieve the data. Going through this extra translation layer adds unnecessary overhead and makes accessing data through DAO slow. 

RDO

Remote Data Objects (RDO) was Microsoft’s solution to the slow performance created by DAO. For talking to databases other than Microsoft Access, RDO did not use the JET engine like DAO; instead, it communicated directly with the ODBC layer. Figure 1-3 shows this relationship.

Removing the JET engine from the call stack greatly improved performance to ODBC data sources! The JET engine was only used when accessing a Microsoft Access Database. In addition, RDO had the capability to use client-side cursors to navigate the records, as opposed to the server-side cursor requirements of DAO.

This greatly reduced the load on the database server, enabling not only the application to perform better, but also the databases on which that application was dependant.

ADO.NET3

RDO was primarily targeted toward larger, commercial customers, many of whom avoided DAO due to the performance issues. Instead of RDO replacing DAO, they largely co-existed. This resulted for several reasons: First, users who developed smaller applications, where performance wasn’t as critical, didn’t want to take the time to switch over to the new API. Second, RDO was originally only released with the Enterprise Edition of Visual Basic, so some developers didn’t have a choice. Third, with the release of ODBCDirect, a DAO add-on that routed the ODBC requests through RDO instead of the JET engine, the performance gap between the two became much smaller. Finally, it wasn’t long after the release of RDO that Microsoft’s next universal access API was released.

ADO 

Microsoft introduced ActiveX Data Objects (ADO) primarily to provide a higher-level API for working with OLE-DB. With this release, Microsoft took many of the lessons from the past to build a lighter, more efficient, and more universal data access API. Unlike RDO, ADO was initially promoted as a replacement for both DAO and RDO. At the time of its release, it (along with OLE-DB) was widely believed to be a universal solution for accessing any type of datafrom databases to e-mail, flat text files, and spreadsheets.

As stated before, ADO was primarily released to complement OLE-DB; however, ADO was not limited to just communicating with OLE-DB data sources. ADO introduced the provider model, which enabled software vendors to create their own providers relatively easily, which could then be used by ADO to communicate with a given vendor’s data source and implement many of the optimizations specific to that data source. The ODBC provider that shipped with ADO was one example of this. When a developer connected to an ODBC data source, ADO would communicate through the ODBC provider instead of through OLE-DB. More direct communication to the data source resulted in better performance and an easily extensible framework. Figure 1-4 shows this relationship.

 ADO.NET4

 

In addition to being a cleaner object model, ADO also offered a wider feature set to help lure developers away from DAO and RDO. These included the following:

Batch Updating—For the first time, users enjoyed the capability to make changes to an entire recordset in memory and then persist these changes back to the database by using the UpdateBatch command.

Disconnected Data Access—Although this wasn’t available in the original release, subsequent releases offered the capability to work with data in a disconnected state, which greatly reduced the load placed on database servers.

Multiple Recordsets—ADO provided the capability to execute a query that returns multiple recordsets and work with all of them in memory. This feature wasn’t even available in ADO.NET until this release, now known as Multiple Active Result Sets (MARS).

ADO.NET

With the release of the .NET Framework, Microsoft introduced a new data access model, called ADO.NET. The ActiveX Data Object acronym was no longer relevant, as ADO.NET was not ActiveX, but Microsoft kept the acronym due to the huge success of ADO. In reality, it’s an entirely new data access model written in the .NET Framework.

ADO.NET supports communication to data sources through both ODBC and OLE-DB, but it also offers another option of using database-specific data providers. These data providers offer greater performance by being able to take advantage of data-source-specific optimizations. By using custom code for the data source instead of the generic ODBC and OLE-DB code, some of the overhead is also avoided. The original release of ADO.NET included a SQL provider and an OLE-DB provider, with the ODBC and Oracle providers being introduced later. Many vendors have also written providers for their databases since. Figure 1.5 shows the connection options available with ADO.NET.

 ADO.NET5

With ADO.NET, the days of the recordset and cursor are gone. The model is entirely new, and consists of five basic objects:

Connection—The Connection object is responsible for establishing and maintaining the connection to the data source, along with any connection-specific information.

Command—The Command object stores the query that is to be sent to the data source, and any applicable parameters.

DataReader—The DataReader object provides fast, forward-only reading capability to quickly loop through the records.

DataSet—The DataSet object, along with its child objects, is what really makes ADO.NET unique. It provides a storage mechanism for disconnected data. The DataSet never communicates with any data source and is totally unaware of the source of the data used to populate it. The best way to think of it is as an in-memory repository to store data that has been retrieved.

DataAdapter—The DataAdapter object is what bridges the gap between the DataSet and the data source. The DataAdapter is responsible for retrieving the data from the Command object and populating the DataSet with the data returned. The DataAdapter is also responsible for persisting changes to the DataSet back to the data source.

ADO.NET made several huge leaps forward. Arguably, the greatest was the introduction of truly disconnected data access. Maintaining a connection to a database server such as MS SQL Server is an expensive operation. The server allocates resources to each connection, so it’s important to limit the number of simultaneous connections. By disconnecting from the server as soon as the data is retrieved, instead of when the code is done working with that data, that connection becomes available for another process, making the application much more scalable

Another feature of ADO.NET that greatly improved performance was the introduction of connection pooling. Not only is maintaining a connection to the database an expensive operation, but creating and destroying that connection is also very expensive. Connection pooling cuts down on this. When a connection is destroyed in code, the Framework keeps it open in a pool. When the next process comes around that needs a connection with the same credentials, it retrieves it from the pool, instead of creating a new one.      

Several other advantages are made possible by the DataSet object. The DataSet object stores the data as XML, which makes it easy to filter and sort the data in memory. It also makes it easy to convert the data to other formats, as well as easily persist it to another data store and restore it again.

 

Posted in 1. Microsoft.NET | Tagged: | 3 Comments »