Friday, 18 September 2015

First hand experience on SAS

So let’s start...this is my first post so feeling a bit excited.There are some questions which couldn't solve ,so kindly do suggest me a way out for those questions.


Conditional Formats and Labels


*Question1

Run the program here to create a temporary SAS data set called Voter:
data voter;
input Age Party : $1. (Ques1-Ques4)($1. + 1);
datalines;
23 D 1 1 2 2
45 R 5 5 4 1
67 D 2 4 3 3
39 R 4 4 4 4
19 D 2 1 2 1
75 D 3 3 2 3
57 R 4 3 4 4
;
Add formats for Age (0–30, 31–50, 51–70, 71+), Party (D = Democrat, R =
Republican), and Ques1–Ques4 (1=Strongly Disagree, 2=Disagree, 3=No
Opinion, 4=Agree, 5=Strongly Agree). In addition, label Ques1–Ques4 as
follows:
Ques1 The president is doing a good job
Ques2 Congress is doing a good job
Ques3 Taxes are too high
Ques4 Government should cut spending
;
Ans
proc format;
value umar 0-30 = 'less than 30'
30-50 = '30 to 50'
50-70 = '50 to 70'
70-high = 'above 70';
value $politics 'D' = 'Democrat'
'R' = 'Republican';
value $likert '1' = 'Strongly Disagree'
 '2'='Disagree'
 '3'='No Opinion' 
 '4'='Agree' 
 '5'='Strongly Agree';
 run;
data voter;
infile datalines;
input Age Party$ (Ques1-Ques4)($1.+1);
datalines;
23 D 1 1 2 2
45 R 5 5 4 1
67 D 2 4 3 3
39 R 4 4 4 4
19 D 2 1 2 1
75 D 3 3 2 3
57 R 4 3 4 4
;
run;
PROC PRINT noobs;
RUN;
proc freq;
format Age umar.
Party $politics.
Ques1-Ques4 $likert.;
label Ques1 = 'The president is doing a good job'
Ques2 = 'Congress is doing a good job'
Ques3 = 'Taxes are too high'
Ques4 = 'Government should cut spending';
run;



*Question2

You want to see frequencies for Questions 1 to 4 from the previous question.
However, you want only three categories: Generally Disagree (combine
Strongly Disagree and Disagree), No Opinion, and Generally Agree
(combine Agree and Strongly Agree). Accomplish this using a new format for
Ques1–Ques4.;
Ans
proc format;
value umar 0-<30 = 'less than 30'
30-<50 = '30 to 50'
50-<70 = '50 to 70'
70-high = 'above 70';
value $politics 'D' = 'Democrat'
'R' = 'Republican';
value $likert '1','2'='Generally Disagree'
 '3'='No Opinion'
 '4','5'='Generally Agree';
 run;
data voter;
infile datalines;
input Age Party$ (Ques1-Ques4)($1.+1);
datalines;
23 D 1 1 2 2
45 R 5 5 4 1
67 D 2 4 3 3
39 R 4 4 4 4
19 D 2 1 2 1
75 D 3 3 2 3
57 R 4 3 4 4
;
run;
PROC PRINT noobs;
RUN;
proc freq;
format Age umar.
Party $politics.
Ques1-Ques4 $likert.;
label Ques1 = 'The president is doing a good job'
Ques2 = 'Congress is doing a good job'
Ques3 = 'Taxes are too high'
Ques4 = 'Government should cut spending';
run;



*Question 3

Run the following program to create a SAS data set called Colors (see Chapter 21 for
a discussion of the double at signs [@@] in the INPUT statement):
data colors;
input Color : $1. @@;
datalines;
R R B G Y Y . . B G R B G Y P O O V V B
;
Ans
proc format;
value $col 'R','B','G' = 'Group 1'
'Y','O' = 'Group 2'
'.' = 'Not Given'
other = 'Group 3'
run;
data colors;
input Color : $1. @@;
datalines;
R R B G Y Y . . B G R B G Y P O O V V B
;run;
proc freq;
format Color $col.;
run;
 

Performing Conditional Processing

*Question1

Run the program here to create a temporary SAS data set called School:
data school;
input Age Quiz : $1. Midterm Final;
/* Add you statements here */
datalines;
12 A 92 95
12 B 88 88
13 C 78 75
13 A 92 93
12 F 55 62
13 B 88 82
;
Using IF and ELSE IF statements, compute two new variables as follows: Grade
(numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13.
The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75,
D = 70, and F = 65. Using this information, compute a course grade (Course) as a
weighted average of the Quiz (20%), Midterm (30%) and Final (50%).;
Ans
data school;
input Age Quiz : $1. Midterm Final;
if Age eq 12 then Grade= 6;
else if Age eq 13 then Grade = 8;
if Quiz eq 'A' then Course = ((0.2*95)+(0.3*Midterm)+(0.5*Final));
else if Quiz eq 'B' then Course = ((0.2*75)+(0.3*Midterm)+(0.5*Final));
else if Quiz eq 'C' then Course = ((0.2*70)+(0.3*Midterm)+(0.5*Final));
else if Quiz eq 'F' then Course = ((0.2*65)+(0.3*Midterm)+(0.5*Final));
datalines;
12 A 92 95
12 B 88 88
13 C 78 75
13 A 92 93
12 F 55 62
13 B 88 82
;
run;
proc print noobs;
run;


*Question3

Using the Sales data set, list the observations for employee numbers (EmpID) 9888
and 0177. Do this two ways, one using OR operators and the other using the IN
operator.;
Ans
proc import datafile= '/folders/myfolders/Practice/Sales.xls' dbms=xls out=Sales Replace;
run;
proc print;
run;
proc print data=Sales;
where EmpID eq '9888' or EmpID eq '0177';
run;

proc print data=Sales;
where EmpID in ('9888' '0177');
run;


*Question4

Using the Sales data set, create a new, temporary SAS data set containing Region
and TotalSales plus a new variable called Weight with values of 1.5 for the North
Region, 1.7 for the South Region, and 2.0 for the West and East Regions. Use a
SELECT statement to do this;
Ans
data NewFile;
set Sales;
if Region eq 'North' then Weight = 1.5;
else if Region eq 'South' then Weight = 1.7;
else Weight = 2.0;
run;
proc print noobs;
var Region TotalSales Weight;
run;


*Question5

Starting with the Blood data set, create a new, temporary SAS data set containing
all the variables in Blood plus a new variable called CholGroup. Define this new
variable as follows:
􀂃 CholGroup Chol
􀂃 Low Low – 110
􀂃 Medium 111 – 140
􀂃 High 141 – High
Use a SELECT statement to do this.;
Ans.
data BloodData;
infile '/folders/myfolders/Practice/blood.txt';
input ID$ Gender$ Bloodgroup$ Age$ WBC RBC Chol;
if Chol le 110 and not missing(Chol) then CholGroup='Low-110';
else if Chol ge 111 and chol le 140 then CholGroup='111-140';
else if Chol ge 141 then CholGroup='141-high';
run;
proc print noobs;
run;



*Question6

Using the Sales data set, list all the observations where Region is North and
Quantity is less than 60. Include in this list any observations where the customer
name (Customer) is Pet's are Us.;
Ans
proc import datafile= '/folders/myfolders/Practice/Sales.xls' dbms=xls out=Sales Replace;
run;
proc print data=Sales noobs;
where Customer eq "Pet's are Us" OR (Region eq 'North' and Quantity lt 60) ;
run;


Performing Iterative Processing: Looping



*Questin10;

You are testing three speed-reading methods (A, B, and C) by randomly assigning
10 subjects to each of the three methods. You are given the results as three lines of
reading speeds, each line representing the results from each of the three methods,
respectively. Here are the results:
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
Create a temporary SAS data set from these three lines of data. Each observation
should contain Method (A, B, or C), and Score. There should be 30 observations in
this data set. Use a DO loop to create the Method variable and remember to use a
single trailing @ in your INPUT statement. Provide a listing of this data set using
Ans
PROC PRINT;
data speed_test;
do Method_variable='A','B','C';
do subj=1 to 10;
input Scores @@;
output;
end;end;
datalines;
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
;
run;
proc print noobs;
run;





Working with Dates

*Data set HOSP;
data hosp;
   do j = 1 to 1000;
      AdmitDate = int(ranuni(1234)*1200 + 15500);
      quarter = intck('qtr','01jan2002'd,AdmitDate);
      do i = 1 to quarter;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then
            AdmitDate = AdmitDate + 1;
         if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then
            AdmitDate = AdmitDate - int(3*ranuni(0) + 1);
         DOB = int(25000*Ranuni(0) + '01jan1920'd);
         DischrDate = AdmitDate + abs(10*rannor(0) + 1);
         Subject + 1;
         output;
      end;
   end;
   drop i j;
   format AdmitDate DOB DischrDate mmddyy10.;
run;
proc print data=hosp (obs=10);
run;


*Question4

Using the Hosp data set, compute the subject’s ages two ways: as of January 1, 2006
(Call it AgeJan1), and as of today’s date (call it AgeToday). The variable DOB
represents the date of birth. Take the integer portion of both ages. List the first 10
observations.;
Ans
data AgeJan1;
set hosp;
Age=yrdif(DOB,'01Jan2006'd,'Actual');
run;
title "Listing of AGES1";
proc print data=AgeJan1 (obs=10);
format Age 5.1;
run;

data AgeToday;
set hosp;
Age=yrdif(DOB,Today(),'Actual');
run;
title "Listing of AGES1";
proc print data=AgeToday (obs=10);
format Age 5.1;
run;