Select (SQL)

From wiki.gis.com
Jump to: navigation, search

The SQL SELECT statement returns a result set of records from one or more tables.[1][2]

It retrieves zero or more rows from one or more base tables, temporary tables, or views in a database. In most applications, SELECT is the most commonly used Data Manipulation Language (DML) command. As SQL is a non-procedural language, SELECT queries specify a result set, but do not specify how to calculate it: translating the query into an executable "query plan" is left to the database system, more specifically to the query optimizer.

The SELECT statement has many optional clauses:

  • WHERE specifies which rows to retrieve.
  • GROUP BY groups rows sharing a property so that an aggregate function can be applied to each group.
  • HAVING selects among the groups defined by the GROUP BY clause.
  • ORDER BY specifies an order in which to return the rows.

Examples

Table "T" Query Result
C1 C2
1 a
2 b
SELECT * FROM T;
C1 C2
1 a
2 b
C1 C2
1 a
2 b
SELECT C1 FROM T;
C1
1
2
C1 C2
1 a
2 b
SELECT * FROM T WHERE C1 = 1;
C1 C2
1 a
C1 C2
1 a
2 b
SELECT * FROM T ORDER BY C1 DESC;
C1 C2
2 b
1 a

Given a table T, the query SELECT * FROM T will result in all the elements of all the rows of the table being shown.

With the same table, the query SELECT C1 FROM T will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a projection in Relational algebra, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.

With the same table, the query SELECT * FROM T WHERE C1 = 1 will result in all the elements of all the rows where the value of column C1 is '1' being shown — in Relational algebra terms, a selection will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.

Limiting result rows

Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.

In ISO SQL:2003, result sets may be limited by using

  • cursors, or
  • By introducing SQL window function to the SELECT-statement

ROW_NUMBER() window function

ROW_NUMBER() OVER may be used for a simple limit on the returned rows. For example, to return no more than ten rows:

SELECT * FROM --emp
( SELECT
    ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS ROW_NUMBER,
    COLUMNS
  FROM tablename
) foo
WHERE ROW_NUMBER <= 10

ROW_NUMBER can be non-deterministic: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.

RANK() window function

The RANK() OVER window function acts like ROW_NUMBER, but may return more than n rows in case of tie conditions. For example, to return the top 10 youngest persons:

SELECT * FROM (
  SELECT
    RANK() OVER (ORDER BY age ASC) AS ranking,
    person_id,
    person_name,
    age
  FROM person
) AS foo
WHERE ranking <= 10

The above code could return more than ten rows, for example, if there are two people of the same age, it could return eleven rows.

Non-standard syntax

Result limits

Not all DBMSes support the mentioned window functions, and non-standard syntax has to be used. Below, variants of the simple limit query for different DBMSes are listed:

SELECT * FROM T LIMIT 10 OFFSET 20 MySQL, PostgreSQL (also supports the standard, since version 8.4), SQLite, H2
SELECT * from T WHERE ROWNUM <= 10 Oracle (also supports the standard, since Oracle8i)
SELECT FIRST 10 * from T Ingres
SELECT FIRST 10 * FROM T order by a Informix
SELECT SKIP 20 FIRST 10 * FROM T order by c, d Informix (row numbers are filtered after order by is evaluated. SKIP clause was introduced in a v10.00.xC4 fixpack)
SELECT * FROM T FETCH FIRST 10 ROWS ONLY DB2 (also supports the standard, in Linux, Windows, and Unix since DB2 v8, z/OS support added in v9)
SELECT TOP 10 * FROM T MS SQL Server (also supports the standard, since SQL Server 2005), Sybase ASE, MS Access
SELECT TOP 10 START AT 20 * FROM T Sybase SQL Anywhere (also supports the standard, since version 9.0.1)
SELECT FIRST 10 SKIP 20 * FROM T Interbase, Firebird
SELECT * FROM T ROWS 20 TO 30 Firebird (since version 2.1)

Hierarchical query

Some databases provide specialised syntax for hierarchical data.

Window function

A window function in SQL:2003 is an aggregate function applied to a partition of the result set.

For example,

  sum(population) OVER( PARTITION BY city )

calculates the sum of the populations of all rows having the same city value as the current row.

Partitions are specified using the OVER clause which modifies the aggregate. Syntax:

<OVER_CLAUSE> :: =
   OVER ( [ PARTITION BY <expr>, ... ]
          [ ORDER BY <expression> ] )

The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.

References

  • Horizontal & Vertical Partitioning, Microsoft SQL Server 2000 Books Online

External links