Friday, 2 October 2015

Some more examples...

C h a p t e r 16Summarizing Your Data

SAS has a number of procedures to help you report your data in the form you would like to see it presented. In deciding which procedure is most appropriate for your needs, you must consider a number of factors. These factors include the following.
 ● What is the purpose of the summary?
 ● What information do you need in the report?
 ● Do you need a specific layout or customized information?
 ● Do you need cross-tabulations or hierarchical groupings?
 ● Do you need statistics?

Solved Problems

*16-1

*Using the SAS data set College, compute the mean, median, minimum, and
maximum and the number of both missing and non-missing values for the variables
ClassRank and GPA. Report the statistics to two decimal places;

*Data set COLLEGE;

proc format ;
   value $yesno 'Y','1' = 'Yes'
                'N','0' = 'No'
                ' '     = 'Not Given';
   value $size 'S' = 'Small'
               'M' = 'Medium'
               'L' = 'Large'
                ' ' = 'Missing';
   value $gender 'F' = 'Female'
                 'M' = 'Male'
                 ' ' = 'Not Given';
run;

data college;
   length StudentID $ 5 Gender SchoolSize $ 1;
   do i = 1 to 100;
      StudentID = put(round(ranuni(123456)*10000),z5.);
      if ranuni(0) lt .4 then Gender = 'M';
      else Gender = 'F';
      if ranuni(0) lt .3 then SchoolSize = 'S';
      else if ranuni(0) lt .7 then SchoolSize = 'M';
      else SchoolSize = 'L';
      if ranuni(0) lt .2 then Scholarship = 'Y';
      else Scholarship = 'N';
      GPA = round(rannor(0)*.5 + 3.5,.01);
      if GPA gt 4 then GPA = 4;
      ClassRank = int(ranuni(0)*60 + 41);
      if ranuni(0) lt .1 then call missing(ClassRank);
      if ranuni(0) lt .05 then call missing(SchoolSize);
      if ranuni(0) lt .05 then call missing(GPA);
      output;
   end;
   format Gender $gender1.
          SchoolSize $size.
          Scholarship $yesno.;
   drop i;
run;



title "Statistics on the College Data Set";
proc means data= college        
n          
nmiss          
mean          
median          
min          
max          
maxdec=2;

var ClassRank GPA;
run;





*Q16-3 

Using the SAS data set College, report the mean and median GPA and ClassRank
broken down by school size (SchoolSize). Do this twice, once using a BY statement,
and once using a CLASS statement.;


proc sort data=college out=college1;  
by SchoolSize;
run;
title "Statistics on the College Data Set - Using BY";
title2 "Broken down by School Size";
proc means data=college1          
n          
mean          
median          
min          
max          
maxdec=2;  
by SchoolSize;  
var ClassRank GPA;
run;
title "Statistics on the College Data Set - Using CLASS";
title2 "Broken down by School Size";
proc means data=college          
n          
mean          
median          
min          
max          
maxdec=2;  
class SchoolSize;  
var ClassRank GPA;
run;




C h a p t e r 17
Counting Frequencies

PROC FREQ can use either raw data or cell count data to produce frequency and cross tabulation tables. Raw data, also known as case-record data, report the data as one record for each subject or sample member. Cell count data report the data as a table, listing all possible combinations of data values along with the frequency counts. 

Solved Problems

*Q17-3

Using the data set Blood, produce frequencies for the variable Chol (cholesterol).
Use a format to group the frequencies into three groups: low to 200 (normal), 201
and higher (high), and missing. Run PROC FREQ twice, once using the MISSING
option, and once without. Compare the percentages in both listings.;

*Data set BLOOD;
data blood;
   infile '/folders/myfolders/Practice/blood.txt' truncover;
   length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
   input Subject
         Gender
         BloodType
         AgeGroup
         WBC
         RBC
         Chol;
   label Gender = "Gender"
         BloodType = "Blood Type"
         AgeGroup = "Age Group"
         Chol = "Cholesterol";
run;



proc format;  
value cholgrp low-200  = 'Normal'                
 201-high = 'High'                
 .        = 'Missing';
run;
title "Demonstrating the MISSING Option";
title2 "Without MISSING Option";
proc freq data=blood;  
tables Chol / nocum;  
format Chol cholgrp.;
run;
title "Demonstrating the MISSING Option";
title2 "With MISSING Option";
proc freq data=blood;  
tables Chol / nocum missing;  
format Chol cholgrp.;
run;




*Q17-5

Using the SAS data set College, create a two-way table of Scholarship (rows) by
ClassRank (columns). Use a user-defined format to group class rank into two groups:
70 and lower, and 71 and higher. (Please see the note in Chapter 16, Problem 2,
about the permanent formats used in this data set.);



proc format;  
value rank low-70  = 'Low to 70'            
71-high = '71 and higher';
run;
title "Scholarship by Class Rank";
proc freq data=college;  
tables Scholarship*ClassRank;  
format ClassRank rank.;
run;



C h a p t e r 18
Creating Tabular Reports

PROC REPORT is an excellent tool for preparing standard reports. Not all reports however fit the standard. An advantage of PROC REPORT over PROC TABULATE is its ability to define columns and manipulate the rows of a report. One important report used by statisticians is a table of cumulative totals or percent. Cumulative totals are needed when displaying both the number of responses to a survey and the cumulative responses to date by month. Often it is also important to show the change from one row to the next.


Solved Problems


*18-3; 



Solution



proc format;  
value $gender 'F' = 'Female'                
 'M' = 'Male';
run;
title "Demographics from COLLEGE Data Set";
proc tabulate data=college format=6.;  
class Gender Scholarship SchoolSize;  
tables (Gender all)*(Scholarship all),        
SchoolSize all / rts=25;  
keylabel n=' '          
all = 'Total';  
format Gender $gender.;
run;

No comments:

Post a Comment