SAS: 2015

Some more examples...

C h a p t e r 16Summarizing Your Data

SAS has a number of procedures to help you report your data in the form you would like to see it presented. In deciding which procedure is most appropriate for your needs, you must consider a number of factors. These factors include the following.
● What is the purpose of the summary?
● What information do you need in the report?
● Do you need a specific layout or customized information?
● Do you need cross-tabulations or hierarchical groupings?
● Do you need statistics?

Solved Problems

*16-1

*Using the SAS data set College, compute the mean, median, minimum, and
maximum and the number of both missing and non-missing values for the variables
ClassRank and GPA. Report the statistics to two decimal places;

*Data set COLLEGE;

proc format ;
value $yesno 'Y','1' = 'Yes'
'N','0' = 'No'
' ' = 'Not Given';
value $size 'S' = 'Small'
'M' = 'Medium'
'L' = 'Large'
' ' = 'Missing';
value $gender 'F' = 'Female'
'M' = 'Male'
' ' = 'Not Given';
run;

data college;
length StudentID $ 5 Gender SchoolSize $ 1;
do i = 1 to 100;
StudentID = put(round(ranuni(123456)*10000),z5.);
if ranuni(0) lt .4 then Gender = 'M';
else Gender = 'F';
if ranuni(0) lt .3 then SchoolSize = 'S';
else if ranuni(0) lt .7 then SchoolSize = 'M';
else SchoolSize = 'L';
if ranuni(0) lt .2 then Scholarship = 'Y';
else Scholarship = 'N';
GPA = round(rannor(0)*.5 + 3.5,.01);
if GPA gt 4 then GPA = 4;
ClassRank = int(ranuni(0)*60 + 41);
if ranuni(0) lt .1 then call missing(ClassRank);
if ranuni(0) lt .05 then call missing(SchoolSize);
if ranuni(0) lt .05 then call missing(GPA);
output;
end;
format Gender $gender1.
SchoolSize $size.
Scholarship $yesno.;
drop i;
run;

title "Statistics on the College Data Set";
proc means data= college
n
nmiss
mean
median
min
max
maxdec=2;

var ClassRank GPA;
run;

*Q16-3

Using the SAS data set College, report the mean and median GPA and ClassRank

broken down by school size (SchoolSize). Do this twice, once using a BY statement,
and once using a CLASS statement.;

proc sort data=college out=college1;
by SchoolSize;
run;
title "Statistics on the College Data Set - Using BY";
title2 "Broken down by School Size";
proc means data=college1
n
mean
median
min
max
maxdec=2;
by SchoolSize;
var ClassRank GPA;
run;
title "Statistics on the College Data Set - Using CLASS";
title2 "Broken down by School Size";
proc means data=college
n
mean
median
min
max
maxdec=2;
class SchoolSize;
var ClassRank GPA;
run;

C h a p t e r 17
Counting Frequencies

PROC FREQ can use either raw data or cell count data to produce frequency and cross tabulation tables. Raw data, also known as case-record data, report the data as one record for each subject or sample member. Cell count data report the data as a table, listing all possible combinations of data values along with the frequency counts.

Solved Problems

*Q17-3

Using the data set Blood, produce frequencies for the variable Chol (cholesterol).
Use a format to group the frequencies into three groups: low to 200 (normal), 201
and higher (high), and missing. Run PROC FREQ twice, once using the MISSING
option, and once without. Compare the percentages in both listings.;

*Data set BLOOD;
data blood;
infile '/folders/myfolders/Practice/blood.txt' truncover;
length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
input Subject
Gender
BloodType
AgeGroup
WBC
RBC
Chol;
label Gender = "Gender"
BloodType = "Blood Type"
AgeGroup = "Age Group"
Chol = "Cholesterol";
run;

proc format;
value cholgrp low-200 = 'Normal'
201-high = 'High'
. = 'Missing';
run;
title "Demonstrating the MISSING Option";
title2 "Without MISSING Option";
proc freq data=blood;
tables Chol / nocum;
format Chol cholgrp.;
run;
title "Demonstrating the MISSING Option";
title2 "With MISSING Option";
proc freq data=blood;
tables Chol / nocum missing;
format Chol cholgrp.;
run;

*Q17-5

Using the SAS data set College, create a two-way table of Scholarship (rows) by
ClassRank (columns). Use a user-defined format to group class rank into two groups:
70 and lower, and 71 and higher. (Please see the note in Chapter 16, Problem 2,
about the permanent formats used in this data set.);

proc format;
value rank low-70 = 'Low to 70'
71-high = '71 and higher';
run;
title "Scholarship by Class Rank";
proc freq data=college;
tables Scholarship*ClassRank;
format ClassRank rank.;
run;

So let’s start...this is my first post so feeling a bit excited.There are some questions which couldn't solve ,so kindly do suggest me a way out for those questions.

Conditional Formats and Labels

*Question1

Run the program here to create a temporary SAS data set called Voter:

data voter;

input Age Party : $1. (Ques1-Ques4)($1. + 1);

datalines;

23 D 1 1 2 2

45 R 5 5 4 1

67 D 2 4 3 3

39 R 4 4 4 4

19 D 2 1 2 1

75 D 3 3 2 3

57 R 4 3 4 4

;

Add formats for Age (0–30, 31–50, 51–70, 71+), Party (D = Democrat, R =

Republican), and Ques1–Ques4 (1=Strongly Disagree, 2=Disagree, 3=No

Opinion, 4=Agree, 5=Strongly Agree). In addition, label Ques1–Ques4 as

follows:

Ques1 The president is doing a good job

Ques2 Congress is doing a good job

Ques3 Taxes are too high

Ques4 Government should cut spending

;

Ans

proc format;

value umar 0-30 = 'less than 30'

30-50 = '30 to 50'

50-70 = '50 to 70'

70-high = 'above 70';

value $politics 'D' = 'Democrat'

'R' = 'Republican';

value $likert '1' = 'Strongly Disagree'

'2'='Disagree'

'3'='No Opinion'

'4'='Agree'

'5'='Strongly Agree';

run;

data voter;

infile datalines;

input Age Party$ (Ques1-Ques4)($1.+1);

datalines;

23 D 1 1 2 2

45 R 5 5 4 1

67 D 2 4 3 3

39 R 4 4 4 4

19 D 2 1 2 1

75 D 3 3 2 3

57 R 4 3 4 4

;

run;

PROC PRINT noobs;

RUN;

proc freq;

format Age umar.

Party $politics.

Ques1-Ques4 $likert.;

label Ques1 = 'The president is doing a good job'

Ques2 = 'Congress is doing a good job'

Ques3 = 'Taxes are too high'

Ques4 = 'Government should cut spending';

run;

*Question2

You want to see frequencies for Questions 1 to 4 from the previous question.

However, you want only three categories: Generally Disagree (combine

Strongly Disagree and Disagree), No Opinion, and Generally Agree

(combine Agree and Strongly Agree). Accomplish this using a new format for

Ques1–Ques4.;

Ans

proc format;

value umar 0-<30 = 'less than 30'

30-<50 = '30 to 50'

50-<70 = '50 to 70'

70-high = 'above 70';

value $politics 'D' = 'Democrat'

'R' = 'Republican';

value $likert '1','2'='Generally Disagree'

'3'='No Opinion'

'4','5'='Generally Agree';

run;

data voter;

infile datalines;

input Age Party$ (Ques1-Ques4)($1.+1);

datalines;

23 D 1 1 2 2

45 R 5 5 4 1

67 D 2 4 3 3

39 R 4 4 4 4

19 D 2 1 2 1

75 D 3 3 2 3

57 R 4 3 4 4

;

run;

PROC PRINT noobs;

RUN;

proc freq;

format Age umar.

Party $politics.

Ques1-Ques4 $likert.;

label Ques1 = 'The president is doing a good job'

Ques2 = 'Congress is doing a good job'

Ques3 = 'Taxes are too high'

Ques4 = 'Government should cut spending';

run;

*Question 3

Run the following program to create a SAS data set called Colors (see Chapter 21 for

a discussion of the double at signs [@@] in the INPUT statement):

data colors;

input Color : $1. @@;

datalines;

R R B G Y Y . . B G R B G Y P O O V V B

;

Ans

proc format;

value $col 'R','B','G' = 'Group 1'

'Y','O' = 'Group 2'

'.' = 'Not Given'

other = 'Group 3'

run;

data colors;

input Color : $1. @@;

datalines;

R R B G Y Y . . B G R B G Y P O O V V B

;run;

proc freq;

format Color $col.;

run;

Performing Conditional Processing

*Question1

Run the program here to create a temporary SAS data set called School:

data school;

input Age Quiz : $1. Midterm Final;

/* Add you statements here */

datalines;

12 A 92 95

12 B 88 88

13 C 78 75

13 A 92 93

12 F 55 62

13 B 88 82

;

Using IF and ELSE IF statements, compute two new variables as follows: Grade

(numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13.

The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75,

D = 70, and F = 65. Using this information, compute a course grade (Course) as a

weighted average of the Quiz (20%), Midterm (30%) and Final (50%).;

Ans

data school;

input Age Quiz : $1. Midterm Final;

if Age eq 12 then Grade= 6;

else if Age eq 13 then Grade = 8;

if Quiz eq 'A' then Course = ((0.2*95)+(0.3*Midterm)+(0.5*Final));

else if Quiz eq 'B' then Course = ((0.2*75)+(0.3*Midterm)+(0.5*Final));

else if Quiz eq 'C' then Course = ((0.2*70)+(0.3*Midterm)+(0.5*Final));

else if Quiz eq 'F' then Course = ((0.2*65)+(0.3*Midterm)+(0.5*Final));

datalines;

12 A 92 95

12 B 88 88

13 C 78 75

13 A 92 93

12 F 55 62

13 B 88 82

;

run;

proc print noobs;

run;

*Question3

Using the Sales data set, list the observations for employee numbers (EmpID) 9888

and 0177. Do this two ways, one using OR operators and the other using the IN

operator.;
Ans

proc import datafile= '/folders/myfolders/Practice/Sales.xls' dbms=xls out=Sales Replace;

run;

proc print;

run;

proc print data=Sales;

where EmpID eq '9888' or EmpID eq '0177';

run;

proc print data=Sales;

where EmpID in ('9888' '0177');

run;

*Question4

Using the Sales data set, create a new, temporary SAS data set containing Region

and TotalSales plus a new variable called Weight with values of 1.5 for the North

Region, 1.7 for the South Region, and 2.0 for the West and East Regions. Use a

SELECT statement to do this;

Ans

data NewFile;

set Sales;

if Region eq 'North' then Weight = 1.5;

else if Region eq 'South' then Weight = 1.7;

else Weight = 2.0;

run;

proc print noobs;

var Region TotalSales Weight;

run;

*Question5

Starting with the Blood data set, create a new, temporary SAS data set containing

all the variables in Blood plus a new variable called CholGroup. Define this new

variable as follows:

􀂃 CholGroup Chol

􀂃 Low Low – 110

􀂃 Medium 111 – 140

􀂃 High 141 – High

Use a SELECT statement to do this.;

Ans.

data BloodData;

infile '/folders/myfolders/Practice/blood.txt';

input ID$ Gender$ Bloodgroup$ Age$ WBC RBC Chol;

if Chol le 110 and not missing(Chol) then CholGroup='Low-110';

else if Chol ge 111 and chol le 140 then CholGroup='111-140';

else if Chol ge 141 then CholGroup='141-high';

run;

proc print noobs;

run;

*Question6

Using the Sales data set, list all the observations where Region is North and

Quantity is less than 60. Include in this list any observations where the customer

name (Customer) is Pet's are Us.;

Ans

proc import datafile= '/folders/myfolders/Practice/Sales.xls' dbms=xls out=Sales Replace;

run;

proc print data=Sales noobs;

where Customer eq "Pet's are Us" OR (Region eq 'North' and Quantity lt 60) ;

run;

Performing Iterative Processing: Looping

*Questin10;

You are testing three speed-reading methods (A, B, and C) by randomly assigning

10 subjects to each of the three methods. You are given the results as three lines of

reading speeds, each line representing the results from each of the three methods,

respectively. Here are the results:

250 255 256 300 244 268 301 322 256 333

267 275 256 320 250 340 345 290 280 300

350 350 340 290 377 401 380 310 299 399

Create a temporary SAS data set from these three lines of data. Each observation

should contain Method (A, B, or C), and Score. There should be 30 observations in

this data set. Use a DO loop to create the Method variable and remember to use a

single trailing @ in your INPUT statement. Provide a listing of this data set using

Ans

PROC PRINT;

data speed_test;

do Method_variable='A','B','C';

do subj=1 to 10;

input Scores @@;

output;

end;end;

datalines;

250 255 256 300 244 268 301 322 256 333

267 275 256 320 250 340 345 290 280 300

350 350 340 290 377 401 380 310 299 399

;

run;

proc print noobs;

run;

Working with Dates

*Data set HOSP;

data hosp;

do j = 1 to 1000;

AdmitDate = int(ranuni(1234)*1200 + 15500);

quarter = intck('qtr','01jan2002'd,AdmitDate);

do i = 1 to quarter;

if ranuni(0) lt .1 and weekday(AdmitDate) eq 1 then

AdmitDate = AdmitDate + 1;

if ranuni(0) lt .1 and weekday(AdmitDate) eq 7 then

AdmitDate = AdmitDate - int(3*ranuni(0) + 1);

DOB = int(25000*Ranuni(0) + '01jan1920'd);

DischrDate = AdmitDate + abs(10*rannor(0) + 1);

Subject + 1;

output;

end;

drop i j;

format AdmitDate DOB DischrDate mmddyy10.;

run;

proc print data=hosp (obs=10);

run;

*Question4

Using the Hosp data set, compute the subject’s ages two ways: as of January 1, 2006

(Call it AgeJan1), and as of today’s date (call it AgeToday). The variable DOB

represents the date of birth. Take the integer portion of both ages. List the first 10

observations.;

Ans

data AgeJan1;

set hosp;

Age=yrdif(DOB,'01Jan2006'd,'Actual');

run;

title "Listing of AGES1";

proc print data=AgeJan1 (obs=10);

format Age 5.1;

run;

data AgeToday;

set hosp;

Age=yrdif(DOB,Today(),'Actual');

run;

title "Listing of AGES1";

proc print data=AgeToday (obs=10);

format Age 5.1;

run;

SAS

Friday, 2 October 2015

Some more examples...

C h a p t e r 16Summarizing Your Data

Solved Problems

*16-1

*Q16-3

C h a p t e r 17
Counting Frequencies

Solved Problems

*Q17-3

Friday, 18 September 2015

First hand experience on SAS

Conditional Formats and Labels

*Question1

*Question2

*Question 3

Performing Conditional Processing

*Question1

*Question3

*Question4

*Question5

*Question6

Performing Iterative Processing: Looping

*Questin10;

Working with Dates

*Question4

Friday, 2 October 2015

Some more examples...

C h a p t e r 16Summarizing Your Data

Solved Problems

*16-1

*Q16-3

C h a p t e r 17Counting Frequencies

Solved Problems

*Q17-3

Friday, 18 September 2015

First hand experience on SAS

Conditional Formats and Labels

*Question1

*Question2

*Question 3

Performing Conditional Processing

*Question1

*Question3

*Question4

*Question5

*Question6

Performing Iterative Processing: Looping

*Questin10;

Working with Dates

*Question4

C h a p t e r 17
Counting Frequencies