In this lesson, you’ll learn exactly what SQL is and what it will do for you.
Tip: Reviewing Basic Concepts
What follows is a very brief overview of some basic database concepts. It is intended to either jolt your memory if you already have some database experience, or to provide you with the absolute basics, if you are new to databases. Understanding databases is an important part of mastering SQL, and you might want to find a good book on database fundamentals to brush up on the subject if needed.
Caution: Misuse Causes Confusion
People often use the term database to refer to the database software they are running. This is incorrect, and it is a source of much confusion. Database software is actually called the Database Management System (or DBMS). The database is the container created and manipulated via the DBMS, and exactly what the database is and what form it takes varies from one database to the next.
Note: Table Names
What makes a table name unique is actually a combination of several things including the database name and table name. Some databases also use the name of the database owner as part of the unique name. This means that while you cannot use the same table name twice in the same database, you definitely can reuse table names in different databases.
Tip: Breaking Up Data
It is extremely important to break data into multiple columns correctly. For example, city, state, and ZIP code should always be separate columns. By breaking these out, it becomes possible to sort or filter data by specific columns (for example, to find all customers in a particular state or in a particular city). If city and state are combined into one column, it would be extremely difficult to sort or filter by state.
When you break up data, the level of granularity is up to you and your specific requirements. For example, addresses are typically stored with the house number and street name together. This is fine, unless you might one day need to sort data by street name, in which case splitting house number and street name would be preferable.
Caution: Datatype Compatibility
Datatypes and their names are one of the primary sources of SQL incompatibility. While most basic datatypes are supported consistently, many more advanced datatypes are not. And worse, occasionally you’ll find that the same datatype is referred to by different names in different DBMSs. There is not much you can do about this, but it is important to keep in mind when you create table schemas.
Note: Records or Rows?
You may hear users refer to database records when referring to rows. For the most part the two terms are used interchangeably, but row is technically the correct term.
Tip: Always Define Primary Keys
Although primary keys are not actually required, most database designers ensure that every table they create has a primary key so that future data manipulation is possible and manageable.
The fact that you are reading a book on SQL indicates that you, somehow, need to interact with databases. SQL is a language used to do just this, so before looking at SQL itself, it is important that you understand some basic concepts about databases and database technologies.
Whether you are aware of it or not, you use databases all the time. Each time you select a name from your email address book, you are using a database. If you conduct a search on an Internet search site, you are using a database. When you log into your network at work, you are validating your name and password against a database. Even when you use your ATM card at a cash machine, you are using databases for PIN number verification and balance checking.
But even though we all use databases all the time, there remains much confusion over what exactly a database is. This is especially true because different people use the same database terms to mean different things. Therefore, a good place to start our study is with a list and explanation of the most important database terms.
The term database is used in many different ways, but for our purposes (and indeed, from SQL’s perspective) a database is a collection of data stored in some organized fashion. The simplest way to think of it is to imagine a database as a filing cabinet. The filing cabinet is simply a physical location to store data, regardless of what that data is or how it is organized.
When you store information in your filing cabinet, you don’t just toss it in a drawer. Rather, you create files within the filing cabinet, and then you file related data in specific files.
In the database world, that file is called a table. A table is a structured file that can store data of a specific type. A table might contain a list of customers, a product catalog, or any other list of information.
The key here is that the data stored in the table is one type of data or one list. You would never store a list of customers and a list of orders in the same database table. Doing so would make subsequent retrieval and access difficult. Rather, you’d create two tables, one for each list.
Every table in a database has a name that identifies it. That name is always unique—meaning no other table in that database can have the same name.
Tables have characteristics and properties that define how data is stored in them. These include information about what data may be stored, how it is broken up, how individual pieces of information are named, and much more. This set of information that describes a table is known as a schema, and schemas are used to describe specific tables within a database, as well as entire databases (and the relationship between tables in them, if any).
Tables are made up of columns. A column contains a particular piece of information within a table.
The best way to understand this is to envision database tables as grids, somewhat like spreadsheets. Each column in the grid contains a particular piece of information. In a customer table, for example, one column contains the customer number, another contains the customer name, and the address, city, state, and ZIP code are all stored in their own columns.
Each column in a database has an associated datatype. A datatype defines what type of data the column can contain. For example, if the column were to contain a number (perhaps the number of items in an order), the datatype would be a numeric datatype. If the column were to contain dates, text, notes, currency amounts, and so on, the appropriate datatype would be used to specify this.
Datatypes restrict the type of data that can be stored in a column (for example, preventing the entry of alphabetical characters into a numeric field). Datatypes also help sort data correctly and play an important role in optimizing disk usage. As such, special attention must be given to picking the right datatype when tables are created.
Data in a table is stored in rows; each record saved is stored in its own row. Again, envisioning a table as a spreadsheet style grid, the vertical columns in the grid are the table columns, and the horizontal rows are the table rows.
For example, a customers table might store one customer per row. The number of rows in the table is the number of records in it.
Every row in a table should have some column (or set of columns) that uniquely identifies it. A table containing customers might use a customer number column for this purpose, whereas a table containing orders might use the order ID. An employee list table might use an employee ID or the employee Social Security number column.
This column (or set of columns) that uniquely identifies each row in a table is called a primary key. The primary key is used to refer to a specific row. Without a primary key, updating or deleting specific rows in a table becomes extremely difficult as there is no guaranteed safe way to refer to just the rows to be affected.
Any column in a table can be established as the primary key, as long as it meets the following conditions:
Primary keys are usually defined on a single column within a table. But this is not required, and multiple columns may be used together as a primary key. When multiple columns are used, the rules listed above must apply to all columns, and the values of all columns together must be unique (individual columns need not have unique values).
There is another very important type of key called a foreign key, but I’ll get to that later on in Lesson 12, “Joining Tables.”
SQL (pronounced as the letters S-Q-L or as sequel) is an abbreviation for Structured Query Language. SQL is a language designed specifically for communicating with databases.
Unlike other languages (spoken languages like English, or programming languages like Java, C, or PHP), SQL is made up of very few words. This is deliberate. SQL is designed to do one thing and do it well—provide you with a simple and efficient way to read and write data from a database.
What are the advantages of SQL?
And with that, let’s learn SQL.
Note: SQL Extensions
Many DBMS vendors have extended their support for SQL by adding statements or instructions to the language. The purpose of these extensions is to provide additional functionality or simplified ways to perform specific operations. And while often extremely useful, these extensions tend to be very DBMS specific, and they are rarely supported by more than a single vendor.
Like any language, the best way to learn SQL is to try it for yourself. To do this you’ll need a database and an application with which to test your SQL statements.
All of the lessons in this book use real SQL statements and real database tables. Appendix A, “The Example Tables,” explains what the example tables are, and provides details on how to obtain (or create) them so that you may follow along with the instructions in each lesson. Appendix B, “Working in Popular Applications,” describes the steps needed to execute your SQL in a variety of applications. Before proceeding to the next lesson, I’d strongly suggest that you turn to these two appendixes so that you’ll be ready to follow along.
In this first lesson, you learned what SQL is and why it is useful. Because SQL is used to interact with databases, you also reviewed some basic database terminology.