Introduction to SAS

Ecoomics 6896

Spring, 2001

 

Tony Lima

 

 


Table of Contents

 

I.                  Introduction to SAS.. 3

II.                Starting SAS.. 3

III.               SAS Data Storage. 5

IV.              Working with SAS Tables. 5

V.                Importing External Data. 10

VI.              Entering Data. 15

VII.             Time Series Analysis. 27

VIII.            Univariate Data Analysis. 32

IX.              Multivariate Data Analysis. 38

X.                Conclusion. 45

 

 


Table of Figures

 

Figure 1: SAS Opening screen. 2

Figure 2: Available Libraries. 3

Figure 3: The Prdsale table. 5

Figure 4: Save As Dialog Box. 6

Figure 5: New Library dialog box. 7

Figure 6: Save As MyProductSales. 8

Figure 7: Column attributes. 9

Figure 8: Import Wizard. 10

Figure 9: Import Wizard, Select File. 11

Figure 10: Import Wizard, Select Library and Member 12

Figure 11: Import Wizard, Create SAS Statements. 13

Figure 12: The Excel Table. 14

Figure 13: SAS ASSIST Start Menu. 15

Figure 14: SAS ASSIST Block Menu. 16

Figure 15: Enter Data Interactively. 17

Figure 16: Creating a SAS Table. 18

Figure 17: Select a library. 19

Figure 18: New Table Structure. 20

Figure 19: Sample data structure. 21

Figure 20: Data entry. 22

Figure 21: First new record. 23

Figure 22: Edit Data selection window.. 24

Figure 23: Edit Data One Record at a Time. 25

Figure 24: The first record. 26

Figure 25: The column selection dialog box. 27

Figure 26: The seasonal adjustment setup. 28

Figure 27: Tests for seasonality. 29

Figure 28: Seasonal factors. 30

Figure 29: Domestic airline passenger miles are seasonal 31

Figure 30: Summary Statistics. 32

Figure 31: The MEANS Procedure results. 33

Figure 32: Interactive data analysis window.. 34

Figure 33: Interactive data analysis variable selection. 35

Figure 34: Histogram.. 36

Figure 35: Descriptive statistics. 37

Figure 36: Regression window.. 38

Figure 37: Subset Data window.. 39

Figure 38: Build WHERE Clause. 40

Figure 39: Column selection. 41

Figure 40: Build WHERE clause. 42

Figure 41: Select a Product Type. 43

Figure 42: Regression Output 44


I.                   Introduction to SAS

A.               Stands for Statistical Analysis System.  Has been around for over 20 years.

B.               With version 8.0, interactive tools have become usable.  SAS is trying to become a data warehousing, data mining tool.  As such, there are many new features that may not be necessary.

C.               As noted in e-mail, a full SAS installation requires 588,709,888 bytes of disk space.

II.                Starting SAS

A.               Either double click the icon on the desktop  or select Start/Programs/The SAS System/The SAS System for Windows v.8.  (This will be under T on the Start menu rather than under S as you might expect.)

B.               Either way, you'll see Figure 1.

Figure 1: SAS Opening screen

C.               The left frame is the "SAS Explorer" (similar to the Windows Explorer).  Double-click "Libraries" and you'll see all the available libraries (similar to Figure 2).

1.                 Tip: to move back up a level in the Explorer frame, make sure the Explorer frame is selected, then click the up-arrow folder on the SAS toolbar.

2.                 The Explorer frame is selected when it looks like Figure 2, with the red background behind the word Explorer.

3.                 If the Explorer frame is not selected, click the word Explorer once.

Figure 2: Available Libraries

D.               Getting around

1.                 The SAS toolbar is the bar that summarizes commands.  It's just below the menu bar.

2.                 Open SAS windows are shown at the bottom of the SAS window.  You can view the contents of any window by clicking its button.

3.                 By default, the Explorer, Log and Editor windows are opened when you run SAS.  When SAS produces some output, the Output window will be maximized.

4.                 Warning: If you don't clear the output window using Edit/Clear all, your output will accumulate.

III.              SAS Data Storage

A.               SAS stores data in Libraries.  A library is (roughly) a folder on your hard drive.  The default location for these folders is \My Documents\My SAS Files.

B.               A library can contain a number of different data types.  We'll mainly be concerned with data tables.

C.               A data table is organized in rows and columns.  The columns are often called "fields" and the rows "records."  Each column includes:

1.                 a column name

2.                 a data type.  Character (text), numeric and date are the three most common data types.

3.                 a field width, the longest possible entry a field can contain.

4.                 the number of places to the right of the decimal point (numeric fields only).

5.                 Examples: a field format of 8.0 means the field can contain up to 8 spaces for numbers with no decimal places. A format of $CHAR10 can contain up to 10 text characters; the $ sign means blank spaces are allowed. A format of DOLLAR12.2 means a dollar currency field with 12 spaces and 2 decimal places.  The 12 spaces allowed include the dollar sign, commas, the decimal point, and the 2 decimal places.  Count every space.

6.                 Rule: when in doubt, make the field wider.

IV.            Working with SAS Tables

A.               Double-click the SASHelp library, then scroll down until you see the Prdsale table.

B.               Double-click Prdsale. You should see Figure 3.

Figure 3: The Prdsale table

C.               Select File/Save As to open the Save As dialog box (Figure 4).

Figure 4: Save As Dialog Box

1.                 Notice the two unusual buttons near the top right corner of the dialog box.  The one furthest left is the "Create new library" button while the one next to it is the "Create new catalog" button.

2.                 From the SAS help system:

A SAS catalog is a SAS file that contains entries. The entries in a catalog serve a variety of utility purposes. For example, the function key settings that you use in a display manager session are stored in a KEYS entry. Catalogs are created by procedures or the CATALOG window. The entries within a catalog are created in various ways, depending on the type of entry.

A SAS data library is defined as a collection of SAS files that are recognized as a unit by the SAS System. On directory-based systems, it is a group of files in the same directory.

There is no limit to the number of SAS files you can store in a SAS data library. You can have different kinds of SAS files in one library. SAS files are stored and retrieved according to the SAS data library to which they belong.

3.                 Libraries can contain catalogs.  For our purposes, we will work only with libraries (for now).

D.               Click the Create new library button.

1.                 If you can't figure out which button that is, hover the mouse pointer over a button for a few seconds and a tag will appear identifying it. You'll see Figure 5.

Figure 5: New Library dialog box

2.                 Click the Browse button, pull down the directory menu and double-click My Documents, then double-click My SAS Files.  Click OK.

3.                 Type Mylib in the Name box, then click OK.  This creates the Mylib library.  However you still must double-click the library name to select it.

4.                 You should be looking at the Save As dialog box. Type MyProductSales in the name box and click OK (Figure 6).

Figure 6: Save As MyProductSales

5.                 Open the following tables from the SASHelp library and save them to Mylib: USEcon and Retail.

6.                 Open the Mylib library, then open MyProductSales.

7.                 Click the column header for Predicted Sales once, then right-click in the column header. Select Column Attributes from the popup menu.  You'll see Figure 7.

Figure 7: Column attributes

Even though you can't see all of it, the Format box says DOLLAR12.2.  This means the column is formatted as dollars (other currencies are available), the total number of spaces allowed is 12 and there are 2 places to the right of the decimal point. 

The total number of spaces must be large enough to accommodate the largest number.  When counting spaces, be sure to allow for the currency sign, the decimal point, the comma, and any special characters associated with negative numbers.

Rule of thumb: when in doubt, increase the column width.  Disk space is cheap; reformatting (and possibly losing) your data table is expensive.

8.                 Explore the column attributes by clicking the … button to the right of the Format text box.

To close the Format window, click the x box in the upper right corner.

9.                 Spend a few minutes looking at attributes of other columns. When you're done, close the MyProductSales window.

V.               Importing External Data

A.               The easiest way to import data is to use the SAS Import Wizard.  Select File/Import Data… from the SAS menu.  You'll see Figure 8.

Figure 8: Import Wizard

B.               Pull down the listbox under "data source" to see the data types available.  If possible, arrange your data in either an Excel table or a delimited table.

1.                 A "delimited table" has a specified character separating each field value, usually with a carriage return line feed combination at the end of each record.

2.                 Common types are comma-delimited and tab-delimited.  Remember, the character you use to delimit the fields should not be used as a character in the data.

C.               Select Excel97 and click the Next button. You'll see Figure 9.

Figure 9: Import Wizard, Select File

D.               Before you can save the imported file, you must select a library and give the file a SAS name as shown in Figure 10.

Figure 10: Import Wizard, Select Library and Member

E.               SAS will give you the option of saving the SAS commands used to import the table.  While you probably don't plan on doing this more than once, your plans are probably wrong.  Give the command file a name (Figure 11).

Figure 11: Import Wizard, Create SAS Statements

F.                Click Finish to import the data.

G.              The Excel table we just imported looks like Figure 12

Figure 12: The Excel Table

The first row should be the field names.  SAS wants these names to be 8 characters or less, all capital letters, preceded with a single quote (').  Note that the SAS field name can be much longer; however, you must make this change by changing the field attributes after the table is imported.

Each row is one data record.  If you're creating longitudinal data, create a separate table for each period, then worry about working with the longitudinal data later.

If you can, format each data column in Excel.  Select the column (excluding the first row which contains the field name), right click in any cell, click the Number tab and change the format to whatever you want SAS to make it.  Stick to Currency, Number and Date types if you can.  If you have text fields, make them General type in Excel.

You can select a column of data by clicking in the top cell of the first row of actual data (row 2), scrolling to the last row and clicking in the bottom cell of the column while holding down the shift key.  If you put columns that have exactly the same data type next to each other, you can format them all at once by clicking the top left cell and the bottom right cell while holding down the shift key.

VI.            Entering Data

A.               First, run SAS ASSIST by selecting Solutions/ASSIST from the SAS menu.

1.                 You'll first see Figure 13.

Figure 13: SAS ASSIST Start Menu

2.                 Click Continue. This will open the SAS ASSIST block menu (Figure 14).

Figure 14: SAS ASSIST Block Menu

3.                 Click the Data Mgmt button, then select Create Data from the popup menu.  You'll see a second popup; usually you'll select Interactively, which takes you to Figure 15.

Figure 15: Enter Data Interactively

Usually the best option is to enter data in tabular form. However, if you have a large number of fields, you should enter the data one record at a time.  In section VI.A.10 we'll see why.

Before we leave this screen, note the weird button labeled "Goback."  SAS uses this instead of the more usual "Close" button.  Goback returns you to the screen you were previously looking at.

4.                 Click "Enter data in tabular form." You'll see Figure 16.

Figure 16: Creating a SAS Table

5.                 Type a name for your table and click the Permanent radio button.  This will open the "Select a library" window (Figure 17).

Figure 17: Select a library

6.                 Click the library you want followed by OK.  This will take you back to the Create new SAS Table window (Figure 16). Click OK.

7.                 The "New Table Structure" window will open (Figure 18).

Figure 18: New Table Structure

This window is where you set up your field names, data type, length, give them a label, and define their format.  Let's look at these one at a time:

1.      Name is the name of the field.  It should be 8 characters long or less and contain only letters and numbers.  It should not begin with a letter.

2.      Data types can be numeric (N), or character (C or $).

3.      The default length is 8.  This is also the maximum length for numeric variables. For character variables, the maximum length is 200.

4.      The label can be up to 40 characters long.  It should describe what's in the field.

5.      The format is the way the data will be displayed.  The DOLLAR12.2 format is the best known. For more information about SAS formats, search the help file for the following topics: FORMATS: Numeric formats;

8.                 Here's a sample data structure (Figure 19).

Figure 19: Sample data structure

9.                 When you're done creating the structure, click the X to close the window. 

Chances are you've made a mistake somewhere.  SAS informs you of this by refusing to close the window and highlighting the area where there's a problem. There may be an error message on the message bar at the bottom of the screen.  The area may not be highlighted; look for the field in which the mouse cursor is placed.

10.             When you're done entering the data structure (and get it correct), SAS will close the data structure window and open the data entry window (Figure 20).

Figure 20: Data entry

SAS expects you to enter data in a tabular format. However, our fields are too wide and there are too many of them to see them all at once.  Let's enter a record.

11.             Entering Data

This is more difficult than it should be.  SAS wants you to use the Tab key to move between fields.  Unfortunately, it will only let you move between fields you can see on the screen.  That means you must use the horizontal scroll bar to scroll field names, then select the next empty field and enter the data and so on.  

When you enter the data for the last field in a record, press Enter to start a new record. Figure 21 shows part of the data for record 1.

Figure 21: First new record

Select File/Save from the SAS menu, then click X to close the data entry table.

12.             Entering Data One Record at a Time

Click the Data Mgmt button, then select Edit/Browse and Edit data from the popup menus.  You'll see Figure 22.

Figure 22: Edit Data selection window

This time click the Single row button.  SAS will remember the last library and table you opened and make those the default (Figure 23).

Figure 23: Edit Data One Record at a Time

If you're happy with this selection, click the Run button.

Otherwise, select a new library by clicking the Libref button and/or a new table by clicking the Table button.  Remember to select the library first if you want to change both.

You'll see Figure 24

Figure 24: The first record

To add a new record, select Edit/Add new record.  If SAS won't let you type data, press the Ins key once.  (The cursor should look like a block, not a vertical line.)

Guess what?  You have to select Edit/Add new record again.

13.             Recommendation: create your data in Excel, then import it.

14.             Second recommendation: save your data frequently by clicking the  button on the toolbar.

VII.          Time Series Analysis

A.               Select Solutions/ASSIST. Click the Continue button in the dialog box.

B.               Click the Data Analysis button, then select Time Series and Seasonal Adjustment from the popup menus.

C.               Select the Retail table from the Mylib library by clicking the Table button, Mylib and Retail. Click OK.

D.               Click the Series to be seasonally adjusted button, then select Sales by clicking it once and clicking the right arrow button (Figure 25).

Figure 25: The column selection dialog box

E.               Click the Date column button and select the Date field.

F.                Click the Quarterly button for Periodicity and make the starting date the first quarter of 1980 with the ending date the second quarter of 1994.  The screen should look like Figure 26.

Figure 26: The seasonal adjustment setup

G.              Click the Run button on the toolbar (  ). The following two screens (Figure 27 and Figure 28) show the relevant output.

Figure 27: Tests for seasonality

 

Figure 28: Seasonal factors

H.               The data shows seasonality.  The seasonal factors are the percentage of the mean quarterly sales for each year in each quarter.

I.                    Time series data that has a frequency less than annual (monthly or quarterly) should always be tested for seasonality and seasonally adjusted when appropriate.

J.                Clear the Output window by selecting Edit/Clear all.  Then seasonally adjust the "Airline passenger miles, domestic" column in the USEcon table. (Hint: if you need to see the column descriptions, open the SAS Explorer in the left frame and double-click the USEcon table to View it. Remember, you can see the SAS column name by right-clicking the column title.)

Your results should look like Figure 29.

Figure 29: Domestic airline passenger miles are seasonal

K.               Clear the output window and close the seasonal adjustment window.

VIII.       Univariate Data Analysis

A.               You should be looking at the USEcon table.  If not, open it from Mylib.

B.               Open the ASSIST window and click the Data Analysis button.  Select Elementary and Summary Statistics from the popup menus.  You'll see the Summary Statistics window (Figure 30).

Figure 30: Summary Statistics

C.               Click the Columns button and select AIRPMD.  Then click Mean, Standard deviation, Coefficient of variation, Skewness and Kurtosis. Then click the Run button on the toolbar. You should see Figure 31.

Figure 31: The MEANS Procedure results

1.                 You may be curious about the Class button.  Here's an excerpt from the SAS help system:

"The Class selection gives you a list of all the columns in the selected table, excluding those that already are in use in the active task. Choose the column(s) you want to use as classification column(s).

A classification column is any column, numeric or character, that is used for classifying the data into groups or categories of information. A class column normally has a small number of discrete values, or unique levels, which define subgroups of the data.

In this task the class column is used to compute statistics separately for categories of rows."

2.                 If you want, you can store the results in another SAS table by clicking the Output table button.  You can select a part of your data set with Subset data.  The Additional options button mainly allows you to format the output.

D.               Clear the output window, then select Data Analysis/Interactive. Open Mylib.MyProductSales and set up the window so it looks like .

Figure 32: Interactive data analysis window

E.               Click the Summarize button. Select Actual and Product (Figure 33) then click OK.

Figure 33: Interactive data analysis variable selection

F.                This draws a histogram (Figure 34).

Figure 34: Histogram

G.              Click X to close this window.

H.               Click the variable name Actual in the bottom half of the window.  SAS will tell you it can't create a graph; just press Enter, then click the Statistics button. You'll see .

Figure 35: Descriptive statistics

I.                    This is probably an easier way to get all the summary statistics.  If you click the Frequencies button, you'll get a table of frequencies (which will be a very long table for Actual sales).

J.                Click X until you get back to the Data Analysis window (Figure 32).

IX.            Multivariate Data Analysis

About time, isn't it?

A.               SAS Options

1.                 Click the Data Analysis button, then select Regression and Linear from the two popup menus.

The Time series option is used to correct for autocorrelation.

2.                 Select Mylib.MyProductSales.  This opens the Regression Analysis window (Figure 36).

Figure 36: Regression window

B.               Selecting a Subset of Data

1.                 Click the Subset data button to open the corresponding window (Figure 37).

Figure 37: Subset Data window

2.                 Click the WHERE clause: button to open the Build Where clause window (Figure 38).

Figure 38: Build WHERE Clause

TIP: SAS doesn't behave like the rest of the Windows world.  When you click a radio button, it automatically opens the next window.  Be prepared for this!

3.                 Click the Column radio button to open the column selection window (Figure 39).

Figure 39: Column selection

To make a selection from this window, click the column you want (PRODTYPE) and the  button. You won't need to click OK.

4.                 Click the Comparison operator radio button (Figure 40).

Figure 40: Build WHERE clause

5.                 Select equal from the popup menu, then click the Look up constants radio button (Figure 40). This will open a window that lets you select a product type (Figure 41).  Click Furniture then click OK.

Figure 41: Select a Product Type

6.                 Even though it's difficult to tell, you've constructed a WHERE clause that lets you work with only table rows for which PRODTYPE='FURNITURE'. Click OK, then Goback to return to the regression window.

C.               Setting Up the Regression

1.                 Select the dependent variable and the independent variable(s) by clicking the respective buttons. The example shown has Actual sales as a function of Predicted sales.

2.                 Click the Additional Options button and explore the various statistical and reporting options.

3.                 When you're done fooling around, close those windows and click the Run button on the toolbar (). You'll see Figure 42.

Figure 42: Regression Output

Note that you can save this output as a text file.  First, though, notice that the "Output" window is automatically opened.  You might look at the "Log" window, too.

When you save the output, select the RTF data type.  This will let you easily read your output file into Word for editing. (Don't edit the results!)

X.              Conclusion

A.               This has been an introduction to SAS

B.               There is an entire programming language built in to this product.  However, it is somewhat difficult to learn (to put it mildly).

C.               A sample SAS program is shown below.

libname wine 'c:\wine';

PROC REG data=wine.vf95cf SIMPLE CORR;

MODEL nprice=laplus lagold lasilv labrnz ocplus ocgold ocsilv ocbrnz

             rivplus rivgold rivsilv rivbrnz sfplus sfgold sfsilv sfbrnz

             dmnplus dmngold dmnsilv dmnbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

MODEL nprice=sacplus sacgold sacsilv sacbrnz

             nwiplus nwigold nwisilv nwibrnz wcplus wcgold wcsilv wcbrnz

             sdplus sdgold sdsilv sdbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

PROC REG data=wine.vf95cs SIMPLE CORR;

MODEL nprice=laplus lagold lasilv labrnz ocplus ocgold ocsilv ocbrnz

             rivplus rivgold rivsilv rivbrnz sfplus sfgold sfsilv sfbrnz

             dmnplus dmngold dmnsilv dmnbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

MODEL nprice=sacplus sacgold sacsilv sacbrnz

             nwiplus nwigold nwisilv nwibrnz wcplus wcgold wcsilv wcbrnz

             sdplus sdgold sdsilv sdbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

PROC REG data=wine.vf95ch SIMPLE CORR;

MODEL nprice=laplus lagold lasilv labrnz ocplus ocgold ocsilv ocbrnz

             rivplus rivgold rivsilv rivbrnz sfplus sfgold sfsilv sfbrnz

             dmnplus dmngold dmnsilv dmnbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

MODEL nprice=sacplus sacgold sacsilv sacbrnz

             nwiplus nwigold nwisilv nwibrnz wcplus wcgold wcsilv wcbrnz

             sdplus sdgold sdsilv sdbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

PROC REG data=wine.vf95gw SIMPLE CORR;

MODEL nprice=laplus lagold lasilv labrnz ocplus ocgold ocsilv ocbrnz

             rivplus rivgold rivsilv rivbrnz sfplus sfgold sfsilv sfbrnz

             dmnplus dmngold dmnsilv dmnbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

MODEL nprice=sacplus sacgold sacsilv sacbrnz

             nwiplus nwigold nwisilv nwibrnz wcplus wcgold wcsilv wcbrnz

             sdplus sdgold sdsilv sdbrnz/SELECTION=STEPWISE DETAILS SLS=0.10;

run;

D.               SAS programs have common characteristics

1.                 They begin by defining a data library (in this case the library name is wine and the location is c:\wine).

2.                 They specify a procedure.  Economists are fond of PROC REG because it does regressions.  Note that each data table is referenced as
<library name>.<table name>.

3.                 Each SAS command must end with a ; .

4.                 The MODEL command is
<dependent variable>=<independent variable list>

E.               Books have been written about SAS.  Most are not very good.  The best way to learn SAS is to either take a course or work with someone who already knows it.


Index

 


"Create new catalog" button..................... 7

"Create new library" button...................... 7

Build Where clause window................... 41

Class button.......................................... 34

classification column, defined................. 34

coefficient of variation............................ 33

column attributes................................... 10

column name..................................... 5, 31

column, defined....................................... 5

Comparison operator radio button......... 43

data entry window................................. 23

data mining tool....................................... 3

data table.......................................... 5, 10

data type............................... 5, 15, 21, 45

data warehousing..................................... 3

delimited table, defined.......................... 11

Explorer frame........................................ 4

field......................................................... 5

field format............................................ 21

field label............................................... 21

field length............................................. 21

field name.............................................. 21

field width............................................... 5

frequencies............................................ 38

Goback button, defined......................... 18

histogram.............................................. 36

kurtosis................................................. 33

library, defined........................................ 5

linear regression..................................... 38

mean..................................................... 33

Mylib, creating........................................ 8

MyProductSales table, creating................ 8

New Table Structure window................ 21

Prdsale table........................................... 6

record..................................................... 5

Regression Analysis window.................. 39

row, defined............................................ 5

Run button................................ 26, 29, 44

SAS ASSIST........................................ 16

SAS Explorer.................................... 4, 31

SAS Import Wizard.............................. 11

SAS program........................................ 46

SAS toolbar............................................ 5

SASHelp library...................................... 6

seasonal adjustment............................... 27

skewness.............................................. 33

spaces, number in field length................. 10

standard deviation................................. 33

Statistical Analysis System....................... 3

Subset data button................................. 40

summary statistics.................................. 32

WHERE clause: button.......................... 41