SAS: September 2015

So let’s start...this is my first post so feeling a bit excited.There are some questions which couldn't solve ,so kindly do suggest me a way out for those questions.

Conditional Formats and Labels

*Question1

Run the program here to create a temporary SAS data set called Voter:

data voter;

input Age Party : $1. (Ques1-Ques4)($1. + 1);

datalines;

23 D 1 1 2 2

45 R 5 5 4 1

67 D 2 4 3 3

39 R 4 4 4 4

19 D 2 1 2 1

75 D 3 3 2 3

57 R 4 3 4 4

;

Add formats for Age (0–30, 31–50, 51–70, 71+), Party (D = Democrat, R =

Republican), and Ques1–Ques4 (1=Strongly Disagree, 2=Disagree, 3=No

Opinion, 4=Agree, 5=Strongly Agree). In addition, label Ques1–Ques4 as

follows:

Ques1 The president is doing a good job

Ques2 Congress is doing a good job

Ques3 Taxes are too high

Ques4 Government should cut spending

;

Ans

proc format;

value umar 0-30 = 'less than 30'

30-50 = '30 to 50'

50-70 = '50 to 70'

70-high = 'above 70';

value $politics 'D' = 'Democrat'

'R' = 'Republican';

value $likert '1' = 'Strongly Disagree'

'2'='Disagree'

'3'='No Opinion'

'4'='Agree'

'5'='Strongly Agree';

run;

data voter;

infile datalines;

input Age Party$ (Ques1-Ques4)($1.+1);

datalines;

23 D 1 1 2 2

45 R 5 5 4 1

67 D 2 4 3 3

39 R 4 4 4 4

19 D 2 1 2 1

75 D 3 3 2 3

57 R 4 3 4 4

;

run;

PROC PRINT noobs;

RUN;

proc freq;

format Age umar.

Party $politics.

Ques1-Ques4 $likert.;

label Ques1 = 'The president is doing a good job'

Ques2 = 'Congress is doing a good job'

Ques3 = 'Taxes are too high'

Ques4 = 'Government should cut spending';

run;

*Question2

You want to see frequencies for Questions 1 to 4 from the previous question.

However, you want only three categories: Generally Disagree (combine

Strongly Disagree and Disagree), No Opinion, and Generally Agree

(combine Agree and Strongly Agree). Accomplish this using a new format for

Ques1–Ques4.;

Ans

proc format;

value umar 0-<30 = 'less than 30'

30-<50 = '30 to 50'

50-<70 = '50 to 70'

70-high = 'above 70';

value $politics 'D' = 'Democrat'

'R' = 'Republican';

value $likert '1','2'='Generally Disagree'

'3'='No Opinion'

'4','5'='Generally Agree';

run;

data voter;

infile datalines;

input Age Party$ (Ques1-Ques4)($1.+1);

datalines;

23 D 1 1 2 2

45 R 5 5 4 1

67 D 2 4 3 3

39 R 4 4 4 4

19 D 2 1 2 1

75 D 3 3 2 3

57 R 4 3 4 4

;

run;

PROC PRINT noobs;

RUN;

proc freq;

format Age umar.

Party $politics.

Ques1-Ques4 $likert.;

label Ques1 = 'The president is doing a good job'

Ques2 = 'Congress is doing a good job'

Ques3 = 'Taxes are too high'

Ques4 = 'Government should cut spending';

run;

*Question 3

Run the following program to create a SAS data set called Colors (see Chapter 21 for

a discussion of the double at signs [@@] in the INPUT statement):

data colors;

input Color : $1. @@;

datalines;

R R B G Y Y . . B G R B G Y P O O V V B

;

Ans

proc format;

value $col 'R','B','G' = 'Group 1'

'Y','O' = 'Group 2'

'.' = 'Not Given'

other = 'Group 3'

run;

data colors;

input Color : $1. @@;

datalines;

R R B G Y Y . . B G R B G Y P O O V V B

;run;

proc freq;

format Color $col.;

run;

Performing Conditional Processing

*Question1

Run the program here to create a temporary SAS data set called School:

data school;

input Age Quiz : $1. Midterm Final;

/* Add you statements here */

datalines;

12 A 92 95

12 B 88 88

13 C 78 75

13 A 92 93

12 F 55 62

13 B 88 82

;

Using IF and ELSE IF statements, compute two new variables as follows: Grade

(numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13.

The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75,

D = 70, and F = 65. Using this information, compute a course grade (Course) as a

weighted average of the Quiz (20%), Midterm (30%) and Final (50%).;

Ans

data school;

input Age Quiz : $1. Midterm Final;

if Age eq 12 then Grade= 6;

else if Age eq 13 then Grade = 8;

if Quiz eq 'A' then Course = ((0.2*95)+(0.3*Midterm)+(0.5*Final));

else if Quiz eq 'B' then Course = ((0.2*75)+(0.3*Midterm)+(0.5*Final));

else if Quiz eq 'C' then Course = ((0.2*70)+(0.3*Midterm)+(0.5*Final));

else if Quiz eq 'F' then Course = ((0.2*65)+(0.3*Midterm)+(0.5*Final));

datalines;

12 A 92 95

12 B 88 88

13 C 78 75

13 A 92 93

12 F 55 62

13 B 88 82

;

run;

proc print noobs;

run;

*Question3

Using the Sales data set, list the observations for employee numbers (EmpID) 9888

and 0177. Do this two ways, one using OR operators and the other using the IN

operator.;
Ans

proc import datafile= '/folders/myfolders/Practice/Sales.xls' dbms=xls out=Sales Replace;

run;

proc print;

run;

proc print data=Sales;

where EmpID eq '9888' or EmpID eq '0177';

run;

proc print data=Sales;

where EmpID in ('9888' '0177');

run;

*Question4

Using the Sales data set, create a new, temporary SAS data set containing Region

and TotalSales plus a new variable called Weight with values of 1.5 for the North

Region, 1.7 for the South Region, and 2.0 for the West and East Regions. Use a

SELECT statement to do this;

Ans

data NewFile;

set Sales;

if Region eq 'North' then Weight = 1.5;

else if Region eq 'South' then Weight = 1.7;

else Weight = 2.0;

run;

proc print noobs;

var Region TotalSales Weight;

run;

*Question5

Starting with the Blood data set, create a new, temporary SAS data set containing

all the variables in Blood plus a new variable called CholGroup. Define this new

variable as follows:

􀂃 CholGroup Chol

􀂃 Low Low – 110

􀂃 Medium 111 – 140

􀂃 High 141 – High

Use a SELECT statement to do this.;

Ans.

data BloodData;

infile '/folders/myfolders/Practice/blood.txt';

input ID$ Gender$ Bloodgroup$ Age$ WBC RBC Chol;

if Chol le 110 and not missing(Chol) then CholGroup='Low-110';

else if Chol ge 111 and chol le 140 then CholGroup='111-140';

else if Chol ge 141 then CholGroup='141-high';

run;

proc print noobs;

run;

*Question6

Using the Sales data set, list all the observations where Region is North and

Quantity is less than 60. Include in this list any observations where the customer

name (Customer) is Pet's are Us.;

Ans

proc import datafile= '/folders/myfolders/Practice/Sales.xls' dbms=xls out=Sales Replace;

run;

proc print data=Sales noobs;

where Customer eq "Pet's are Us" OR (Region eq 'North' and Quantity lt 60) ;

run;

Performing Iterative Processing: Looping

*Questin10;

You are testing three speed-reading methods (A, B, and C) by randomly assigning

10 subjects to each of the three methods. You are given the results as three lines of

reading speeds, each line representing the results from each of the three methods,

respectively. Here are the results:

250 255 256 300 244 268 301 322 256 333

267 275 256 320 250 340 345 290 280 300

350 350 340 290 377 401 380 310 299 399

Create a temporary SAS data set from these three lines of data. Each observation

should contain Method (A, B, or C), and Score. There should be 30 observations in

this data set. Use a DO loop to create the Method variable and remember to use a

single trailing @ in your INPUT statement. Provide a listing of this data set using

Ans

PROC PRINT;

data speed_test;

do Method_variable='A','B','C';

do subj=1 to 10;

input Scores @@;

output;

end;end;

datalines;

250 255 256 300 244 268 301 322 256 333

267 275 256 320 250 340 345 290 280 300

350 350 340 290 377 401 380 310 299 399

;

run;

proc print noobs;

run;

Working with Dates

*Data set HOSP;

data hosp;

do j = 1 to 1000;

AdmitDate = int(ranuni(1234)*1200 + 15500);

quarter = intck('qtr','01jan2002'd,AdmitDate);

do i = 1 to quarter;

if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then

AdmitDate = AdmitDate + 1;

if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then

AdmitDate = AdmitDate - int(3*ranuni(0) + 1);

DOB = int(25000*Ranuni(0) + '01jan1920'd);

DischrDate = AdmitDate + abs(10*rannor(0) + 1);

Subject + 1;

output;

end;

drop i j;

format AdmitDate DOB DischrDate mmddyy10.;

run;

proc print data=hosp (obs=10);

run;

*Question4

Using the Hosp data set, compute the subject’s ages two ways: as of January 1, 2006

(Call it AgeJan1), and as of today’s date (call it AgeToday). The variable DOB

represents the date of birth. Take the integer portion of both ages. List the first 10

observations.;

Ans

data AgeJan1;

set hosp;

Age=yrdif(DOB,'01Jan2006'd,'Actual');

run;

title "Listing of AGES1";

proc print data=AgeJan1 (obs=10);

format Age 5.1;

run;

data AgeToday;

set hosp;

Age=yrdif(DOB,Today(),'Actual');

run;

title "Listing of AGES1";

proc print data=AgeToday (obs=10);

format Age 5.1;

run;

SAS

Friday, 18 September 2015

First hand experience on SAS

Conditional Formats and Labels

*Question1

*Question2

*Question 3

Performing Conditional Processing

*Question1

*Question3

*Question4

*Question5

*Question6

Performing Iterative Processing: Looping

*Questin10;

Working with Dates

*Question4