How to Use the OPENROWSET Function to Read Text Files

openrowset

Fayed, a student in Learning Tree’s Introduction to SQL Server, asked how to import text files into SQL Server. Quite a difficult question, actually, because there are many different ways. The hardest part is choosing the method that is best for the task at hand. Today we’re going to take a look at reading text files using SQL Server’s OPENROWSET ( ) function.

Manu SQL Server users discover OPENROWSET in the context of querying linked servers. However, OPENROWSET can read files as well. It can perform single blob reads, which can be useful for bringing XML data into columns or variables of XML datatype. A major advantage of OPENROWSET for the reading of text files is that we don’t need to import the data into a table if we do not wish. OPENROWSET allows you to directly query data from a file on the hard drive without importing it.

To use OPENROWSET with a flat file, we must first create a format file describing the file structure. This can be done using the classic technique or by creating a newer XML-style format file. We will use the classic technique here.

We’ll start with a sample file, DJ1985.txt, that contains daily Dow Jones averages from 1885 to 2008. This file is a conventional comma-separated values file. The data file and associated format and query files can be downloaded here. The format file necessary to read DJ1885.txt looks like this:

Format file for DJ1885 data file

The first line is the version number. The version number for SQL Server 2016 is 13.0, but 9.0 is used here for compatibility with older versions of SQL Server. Below the version number is the number of columns, in this case 8. Then, for each column in the file, there is a descriptor row in the format file. In this example each column has the same datatype, SQLCHAR, meaning that the file contains ANSI text data in the current code page. The datatype column refers to the datatype in the file, not the destination datatype in SQL Server. Column 1, for example, contains date data, but it’s still SQLCHAR in the file.

The third and fourth columns describe the column width, and are ignored if a column separator character is provided. In this example, all the numbers in column four in the format file could be set to zero and it would make no difference.

We will not discuss fixed-width files here, but there is a fixed-width version of the data file and a corresponding format file in the zip file provided with this blog.

Querying the File

Once the format file is complete, we are ready to write our query.

SELECT * FROM OPENROWSET(BULK ’C:\Temp\DJ1885.txt’ ,

FORMATFILE=’C:\Temp\DJ1885.fmt’) AS Test — Correlation name for table is required

Since this SELECT statement follows the same rules as any other SELECT, we can join the text results to other tables or merely include a WHERE clause. Similarly, since the column names are defined in the format file, we are not limited to selecting “*”.

SELECT Date, DJIA FROM OPENROWSET(BULK ’C:\Temp\DJ1885.txt’ ,

FORMATFILE=’C:\Temp\DJ1885.fmt’) AS Test — Correlation name for table is required

WHERE DayOfWeek = ’Mo’

Performing Numeric Calculations

SQL Server knows that it is reading text, but has no way of knowing what that text means to us humans. Many numeric calculations will therefore require an explicit type conversion.

SELECT DATEPART(YY, Date)AS Year, AVG(CAST(DJIA AS DECIMAL(8,2)))AS Average FROM OPENROWSET(BULK ’C:\Temp\DJ1885.txt’ ,

FORMATFILE=’C:\Temp\DJ1885.fmt’) AS Test

GROUP BY DATEPART(YY, Date)

ORDER BY Year

A Much Bigger File and Comparison with BULK INSERT

Of course, if we can select rows from the file we can also insert those rows into an SQL Server table. This raises the question of how OPENROWSET compares with BULK INSERT. To examine this, we will use the larger Ratings.csv file from the MovieLens dataset, a popular example dataset for learning about data mining. The format and query files can be found here. The MovieLens dataset set used here is quite large, including 20 million individual ratings. Grouplens.org provides several download options for this data and smaller subset data files.

Here is the format file:

Format file for movieLen ratings file

We can use the same format file for both OPENROWSET and BULK INSERT. The data file, however, contains column names in the first row, so we must use FIRSTROW to indicate that the first row of data is actually row 2 in this file.

INSERT Ratings SELECT * FROM OPENROWSET(BULK C:\Temp\Ratings.csv’ ,

FORMATFILE=C:\Temp\MovieLensRatings.fmt’,

ERRORFILE=C:\Temp\MovieLensRatings.err’, — error files are created and must not already exist

— error files are not created unless there are errors

— first row in file contains column names; data starts in row 2

FIRSTROW=2) AS Test

The corresponding BULK INSERT statement:

BULK INSERT Ratings FROM C:\Temp\Ratings.csv’ WITH (FORMATFILE=C:\Temp\MovieLensRatings.fmt’, FIRSTROW=2)

The INSERT statement using OPENROWSET executes in 1:09. The BULK INSERT statement took 2:28. We cannot draw any conclusions without testing more files and a variety of different circumstances, but I was pleased to discover that in this case OPENROWSET actually seemed to run faster than BULK INSERT.

Conclusion

Though often associated with linked servers, the OPENROWSET( ) function provides an easy and convenient means to query and import data from flat files. In at least some cases, OPENROWSET may be faster than BULK INSERT.

Learn how to execute dynamic queries using OPENROWSET in Learning Tree’s course, SQL Server Transact-SQL Programming.

Become a True Data Wrangler with SQL Programming Training

View Course

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.