In this lesson, you’ll learn how to use the SELECT
statement to retrieve one or more columns of data from a table.
As explained in Lesson 1, “Understanding SQL,” SQL statements are made up of plain English terms. These terms are called keywords, and every SQL statement is made up of one or more keywords. The SQL statement that you’ll probably use most frequently is the SELECT
statement. Its purpose is to retrieve information from one or more tables.
To use SELECT
to retrieve table data you must, at a minimum, specify two pieces of information—what you want to select, and from where you want to select it.
We’ll start with a simple SQL SELECT
statement, as follows:
Note: Following Along with the Examples
The sample SQL statements (and sample output) throughout the lessons in this book use a set of data files that are described in Appendix A, “Sample Table Scripts.” If you’d like to follow along and try the examples yourself (I strongly recommend that you do so), refer to Appendix A which contains instructions on how to download or create these data files.
It is important to understand that SQL is a language, not an application. The way that you specify SQL statements and display statement output varies from one application to the next. To assist you in adapting the examples to your own environment, Appendix B, “Working in Popular Applications,” explains how to issue the statements taught throughout this book using many popular applications and development environments. And if you need an application with which to follow along, Appendix B has recommendations for you too.
Note: Unsorted Data
If you tried this query yourself you might have discovered that the data was displayed in a different order than shown here. If this is the case, don’t worry—it is working exactly as it is supposed to. If query results are not explicitly sorted (we’ll get to that in the next lesson) then data will be returned in no order of any significance. It may be the order in which the data was added to the table, but it may not. As long as your query returned the same number of rows then it is working.
The previous statement uses the SELECT
statement to retrieve a single column called prod_name
from the Products
table. The desired column name is specified right after the SELECT
keyword, and the FROM
keyword specifies the name of the table from which to retrieve the data. The output from this statement is shown in the following:
prod_name
--------------------
Fish bean bag toy
Bird bean bag toy
Rabbit bean bag toy
8 inch teddy bear
12 inch teddy bear
18 inch teddy bear
Raggedy Ann
King doll
Queen doll
A simple SELECT
statement similar to the one used above returns all the rows in a table. Data is not filtered (so as to retrieve a subset of the results), nor is it sorted. We’ll discuss these topics in the next few lessons.
To retrieve multiple columns from a table, the same SELECT
statement is used. The only difference is that multiple column names must be specified after the SELECT
keyword, and each column must be separated by a comma.
The following SELECT
statement retrieves three columns from the products
table:
Tip: SQL Statement and Case
It is important to note that SQL statements are case-insensitive, so SELECT
is the same as select
, which is the same as Select
. Many SQL developers find that using uppercase for all SQL keywords and lowercase for column and table names makes code easier to read and debug. However, be aware that while the SQL language is case-insensitive, the names of tables, columns, and values may not be (that depends on your DBMS and how it is configured).
Just as in the prior example, this statement uses the SELECT
statement to retrieve data from the Products
table. In this example, three column names are specified, each separated by a comma. The output from this statement is shown below:
prod_id prod_name prod_price
--------- -------------------- ----------
BNBG01 Fish bean bag toy 3.4900
BNBG02 Bird bean bag toy 3.4900
BNBG03 Rabbit bean bag toy 3.4900
BR01 8 inch teddy bear 5.9900
BR02 12 inch teddy bear 8.9900
BR03 18 inch teddy bear 11.9900
RGAN01 Raggedy Ann 4.9900
RYL01 King doll 9.4900
RYL02 Queen dool 9.4900
In addition to being able to specify desired columns (one or more, as seen above), SELECT
statements can also request all columns without having to list them individually. This is done using the asterisk (*
) wildcard character in lieu of actual column names, as follows:
Tip: Terminating Statements
Multiple SQL statements must be separated by semicolons (the ;
character). Most DBMSs do not require that a semicolon be specified after single statements. But if your particular DBMS complains, you might have to add it there. Of course, you can always add a semicolon if you wish. It’ll do no harm, even if it is, in fact, not needed.
When a wildcard (*
) is specified, all the columns in the table are returned. The column order will typically, but not always, be the physical order in which the columns appear in the table definition. However, SQL data is seldom displayed as is. (Usually, it is returned to an application that formats or presents the data as needed). As such, this should not pose a problem.
As you have seen, SELECT
returns all matched rows. But what if you do not want every occurrence of every value? For example, suppose you want the vendor ID of all vendors with products in your products
table:
Tip: Use of White Space
All extra white space within a SQL statement is ignored when that statement is processed. SQL statements can be specified on one long line or broken up over many lines. So, the following three statements are functionality identical:
SELECT prod_name
FROM Products;
SELECT prod_name FROM Products;
SELECT
prod_name
FROM
Products;
Most SQL developers find that breaking up statements over multiple lines makes them easier to read and debug.
Tip: Take Care with Commas
When selecting multiple columns be sure to specify a comma between each column name, but not after the last column name. Doing so will generate an error.
Note: Presentation of Data
As you will notice in the above output, SQL statements typically return raw, unformatted data. Data formatting is a presentation issue, not a retrieval issue. Therefore, presentation (for example, displaying the above price values as currency amounts with the correct number of decimal places) is typically specified in the application that displays the data. Actual retrieved data (without application-provided formatting) is rarely used.
vend_id
----------
BRS01
BRS01
BRS01
DLL01
DLL01
DLL01
DLL01
FNG01
FNG01
The SELECT
statement returned 9 rows (even though there are only four vendors in that list) because there are 9 products listed in the products
table. So how could you retrieve a list of distinct values?
The solution is to use the DISTINCT
keyword which, as its name implies, instructs the database to only return distinct values.
SELECT DISTINCT vend_id
tells the DBMS to only return distinct (unique) vend_id
rows, and so only three rows are returned, as seen in the following output. If used, the DISTINCT
keyword must be placed directly in front of the column names.
vend_id
----------
BRS01
DLL01
FNG01
Caution: Using Wildcards
As a rule, you are better off not using the *
wildcard unless you really do need every column in the table. Even though use of wildcards may save you the time and effort needed to list the desired columns explicitly, retrieving unnecessary columns usually slows down the performance of your retrieval and your application.
Tip: Retrieving Unknown Columns
There is one big advantage to using wildcards. As you do not explicitly specify column names (because the asterisk retrieves every column), it is possible to retrieve columns whose names are unknown.
Caution: Can’t Be Partially DISTINCT
The DISTINCT
keyword applies to all columns, not just the one it precedes. If you were to specify SELECT DISTINCT vend_id, prod_price
, all rows would be retrieved because both of the specified columns were distinct.
SELECT
statements return all matched rows, possibly every row in the specified table. What if you want to return just the first row or a set number of rows? This is doable, but unfortunately, this is one of those situations where all SQL implementations are not created equal.
In Microsoft SQL Server and Microsoft Access you can use the TOP
keyword to limit the top number of entries, as seen here:
SELECT TOP 5 prod_name
FROM Products;
prod_name
-----------------
8 inch teddy bear
12 inch teddy bear
18 inch teddy bear
Fish bean bag toy
Bird bean bag toy
The previous statement uses the SELECT TOP 5
statement to retrieve just the first five rows.
If you are using DB2, well, then you get to use SQL unique to that DBMS, like this:
SELECT prod_name
FROM Products
FETCH FIRST 5 ROWS ONLY;
FETCH FIRST 5 ROWS ONLY
does exactly what it suggests.
If you are using Oracle you need to count rows based on ROWNUM
(a row number counter) like this:
SELECT prod_name
FROM Products
WHERE ROWNUM <=5;
If you are using MySQL, MariaDB, PostgreSQL, or SQLite, you can use the LIMIT
clause, as follows:
The previous statement uses the SELECT
statement to retrieve a single column. LIMIT 5
instructs the supported DBMSs to return no more than five rows. The output from this statement is shown in the following code.
To get the next five rows, specify both where to start and the number of rows to retrieve, like this:
LIMIT 5 OFFSET 5
instructs supported DBMSs to return five rows starting from row 5
. The first number is the number of rows to retrieve, and the second is where to start. The output from this statement is shown in the following code:
prod_name
-------------------
Rabbit bean bag toy
Raggedy Ann
King doll
Queen doll
So, LIMIT
specifies the number of rows to return. LIMIT
with an OFFSET
specifies where to start from. In our example there are only nine products in the Products
table, so LIMIT 5 OFFSET 5
returned just four rows (as there was no fifth).
Caution: Row 0
The first row retrieved is row 0
, not row 1
. As such, LIMIT 1 OFFSET 1
will retrieve the second row, not the first one.
Tip: MySQL, MariaDB, and SQLite Shortcut
MySQL, MariaDB, and SQLite support a shorthand version of LIMIT 4 OFFSET 3
, enabling you to combine them as LIMIT 3,4
. Using this syntax, the value before the ,
is the LIMIT
and the value after the ,
is the OFFSET
.
Note: Not ALL SQL Is Created Equal
I included this section on limiting results for one reason only, to demonstrate that while SQL is usually quite consistent across implementations, you can’t rely on it always being so. While very basic statements tend to be very portable, more complex ones tend to be less so. Keep that in mind as you search for SQL solutions to specific problems.
As you have seen, SQL statements are instructions that are processed by your DBMS. But what if you wanted to include text that you’d not want processed and executed? Why would you ever want to do this? Here are a few reasons:
create.sql
and populate.sql
files used in Appendix B)..sql
files.).Most DBMSs supports several forms of comment syntax. We’ll Start with inline comments:
SELECT prod_name -- this is a comment
FROM Products;
Comments may be embedded inline using --
(two hyphens). Anything after the --
is considered comment text, making this a good option for describing columns in a CREATE TABLE
statement, for example.
Here is another form of inline comment (although less commonly supported):
# This is a comment
SELECT prod_name
FROM Products;
A #
at the start of a line makes the entire line a comment. You can see this format comment used in the accompanying create.sql
and populate.sql
scripts.
You can also create multi line comments, and comments that stop and start anywhere within the script:
/* SELECT prod_name, vend_id
FROM Products; */
SELECT prod_name
FROM Products;
/*
starts a comments, and */
ends it. Anything between /*
and */
is comment text. This type of comment is often used to comment out code, as seen in this example. Here, two SELECT
statements are defined, but the first won’t execute because it has been commented out.
In this lesson, you learned how to use the SQL SELECT
statement to retrieve a single table column, multiple table columns, and all table columns. You also learned how to return distinct values and how to comment your code. And unfortunately, you were also introduced to the fact that more complex SQL tends to be less portable SQL. Next you’ll learn how to sort the retrieved data.
2.1 Write a SQL statement to retrieve all customer ids (cust_id
) from the Customers
table.
2.2 Write a SQL statement that retrieves all columns from the Customers
table, and an alternate SELECT
that retrieves just the customer id, and use comments to “comment out” one SELECT
so as to be able to run the other. (And of course, test both statements).