分享

[Cockcroft98] Section 17.5——17.10

 Stefen 2010-10-24

Functions, Procedures, and Programming Notes

SymbEL supports encapsulated, scoped blocks that can return a value. These blocks are referred to as functions and procedures for notational brevity. To complete our picture, we need to make some points about these constructs.

Function Return Types

So far, the only value that functions have returned in the examples has been a scalar type, but functions can return double or string as well. More complex types, covered in later sections, can be returned as well.

Functions cannot return arrays because there is no syntactic accommodation for doing so. However, there is a way to get around this limitation for arrays of nonstructured types. See “Returning an Array of Nonstructured Type from a Function” on page 552.

Scope

Although variables may be declared local to a function, the default semantics for local variables are for them to be the C equivalent of static. Therefore, even though a local variable has an initialization part in a local scope, this initialization is not performed on each entry to the function. It is done once before the first call and never done again.

Initializing Variables

Variables can be initialized to values that are compatible with their declared type. This is the case for both simple and structured types. The only exceptional condition in initializing variables is the ability to initialize a global variable with a function call. This capability is supported, but use it with great care. In general, avoid it as bad practice.

Arrays can be given initial values through an aggregate initialization. The syntax is identical to that in C. For example:

int array[ARRAY_SIZE] = {
1, 2, 3, 4, 5, -1
};

The size of the array must be large enough to accommodate the aggregate initialization or the parser will flag it as an error.

Notes About Arrays and Strings

The type string is native to SymbEL and has no equivalent in C. Since string is an atomic type, it is incorrect to use the subscript operator to access individual characters in the string. For this reason, we need to be able to interchange values between variables that are of type string and char[].

Array Assignment

Although pointer types are not allowed in SymbEL, assignment of arrays is allowed provided that the size of the target variable is equal to or greater than the size of the source variable. This is not a pointer assignment, though. Consider it a value assignment where the values of the source array are being copied to the target array.

String Type and Character Arrays

Variables declared as type string and arrays of type char have interchangeable values to allow access to individual characters contained in a string. For example:

char tmp[8]; 
string s = "hello";
tmp=s;
if (tmp[0] == 'h') {
...
}

Accessing the individual characters in the string cannot be done with the s variable by itself. If a subscript was used, it would mean that the variable s was an array of strings, not an array of characters. After any modification to the variable tmp in this example is done, the value could be assigned back to s.

Assignment to String Variables

When a variable of type string is assigned a new value, the existing value of the variable is freed and a new copy of the source string is allocated and assigned to the variable. This is also the case when string variables are assigned the return value of a function that returns type string. See “Using Return Values from Attached Functions” on page 553.

Empty Arrays

When a function accepts an array as a parameter, it is not always convenient to send an array of the same size as the parameter. For this reason, the empty array declaration was added for use in parameter declarations. This is a notation where no subscript size is included in the declaration, just the [] suffix to the variable name. Here is an example.

print_it(int array[]) 
{
int i;
for(i=0; array[i] != -1; i++) {
printf("%d\n", array[i]);
}
}
main()
{
int array[6] = { 1, 2, 3, 4, 5, -1 };
print_it(array);
}

Upon entry to the function containing the empty array parameter, the parameter variable obtains a size. In this example, the array parameter is given a size of 24 (6 * 4) upon entry to the function print_it. This size will change for every array passed as an actual parameter.

Recursion

Recursion is not supported. Direct recursion is flagged as an error by the parser. Indirect recursion is silently ignored. The disallowance of recursion is due to a problem in the run time that could not be overcome in the short term and may be fixed in a future release. Examine the following program.

one() 
{
// remember, initialization is only done once
int i = 0;
i++;
switch(i) {
case 1:
printf("Here I am\n");
break;
case 2:
printf("Here I am again\n");
return;
}
two();
}
two()
{
one();
}
main()
{
one();
}


It seems that the output of this program would be

Here I am 
Here I am again

but, in fact, only the first line will be printed out. The second call to one is detected by the run time, and a return from the function is performed before anything is done. SymbEL is not the place to do recursion. Again, if you feel like being tricky, don't be. It probably won't work.

Built-in Functions

SymbEL currently supports a limited set of built-in functions. As the need arises, more built-ins will be added. Many of these built-ins work the same as or similarly to the C library version. For a complete description of those functions, see the manual page for the C function. The current built-in functions are described below.

int fileno(int)

Send the return value of an fopen() or popen() to fileno() to retrieve the underlying file descriptor. This function works like the stdio macro, only it’s a built-in.

int fprintf(int, string, ...)

Print a formatted string onto the file defined by the first parameter. The man page for the fprintf C library function defines the use of this function in more detail.

int sizeof(...)

Return the size of the parameter. This parameter can be a variable, a numeric value, or an expression.

string itoa(int)

Convert an integer into a string.

string sprintf(string, ...)

Return a string containing the format and data specified by the parameters. This function is like the C library function in what it does, but it does not use a buffer parameter as the first argument. The buffer is returned instead. The function is otherwise like the C function.

struct prpsinfo_t first_proc(void)

In conjunction with the next_proc() function, traverse through all of the processes in the system. All of the fields of the prpsinfo_t structure may not be filled because of permissions. With root permissions, they are filled in completely.

struct prpsinfo_t get_proc(int)

Get a process by its process ID (instead of traversing all of the processes). The same rules regarding permissions apply to this function as well as to first_proc().

struct prpsinfo_t next_proc(void)

In conjunction with first_proc(), traverse through all of the processes on the system. When the pr_pidmember of the prpsinfo_t structure is -1 after a return from this function, then all of the processes have been visited.

ulong kvm_address(VAR)

Return the kernel seek value for this variable. This function works only on variables designated as special kvm variables in the declaration.

ulong kvm_declare(string)

Declare a new kvm variable while the program is running. The return value is the kvm address of the variable, used as the second parameter to kvm_cvt.

void debug_off(void)

Turn debugging off until the next call to debug_on().

void debug_on(void)

Turn debugging on following this statement. Debugging information is printed out until the next call to debug_off().

void kvm_cvt(VAR, ulong)

Change the kernel seek value for this variable to the value specified by the second parameter.

void printf(string, ...)

Print a formatted string. Internally, a call to fflush() is made after every call to printf(). This call causes a write() system call; consider the effects of this call when writing SymbEL programs.

void signal(int, string);

Specify a signal catcher. The first parameter specifies the signal name according to the signal.se include file. The second parameter is the name of the SymbEL function to call upon receipt of the signal.

void struct_empty(STRUCT, ulong)

Dump the contents of the structure variable passed as the first parameter into the memory location specified by the second parameter. The binary value dumped will be the same format as when used in a C program.

void struct_fill(STRUCT, ulong)

Replace the data from the second parameter into the structure variable passed as the first parameter. This function allows C-structure-format data to be translated into the internal representation of structures used by SymbEL.

void syslog(int, fmt, ...)

Log a message through the syslog() facility. Note that the %m string must be sent as %%m because the interpreter passes the format string through vsprintf() before passing it to syslog() internally.

Dynamic Constants

The SymbEL interpreter deals with a few physical resources that have variable quantities on each computer on which the interpreter is run. These are the disks, network interfaces, CPUs, and devices that have an interrupt-counters structure associated with the device. It is often necessary to declare arrays that are bounded by the quantity of such a resource. When this is the case, a value is required that is sufficiently large to prevent subscripting errors when the script is running. This requirement is dealt with by means of dynamic constants. These constants can be used as integer values, and the interpreter views them as such. These dynamic constants are:

  • MAX_DISK — Maximum number of disk or disk-like resources

  • MAX_IF — Maximum number of network interfaces

  • MAX_CPU — Maximum number of CPUs

  • MAX_INTS — Maximum number of devices with interrupt counters

    These values are typically set to the number of discovered resources plus one. A single-CPU computer, for instance, will have a MAX_CPU value of 2. Run this script on your system and see what it says.

main() 
{
printf("MAX_DISK = %d\n", MAX_DISK);
printf("MAX_IF = %d\n", MAX_IF);
printf("MAX_CPU = %d\n", MAX_CPU);
printf("MAX_INTS = %d\n", MAX_INTS);
}

Attachable Functions

To ensure against the rampant effects of “creeping featurism” overtaking the size and complexity of the interpreter, a mechanism had to be devised so many procedures and functions could be “built in” without being a built-in.

The solution was to provide a syntactic remedy that defined a shared object that could be attached to the interpreter at run time. This declaration would include the names of functions contained in that shared object. Here is an example.

attach "libc.so" {
int puts(string s);
};
main()
{
puts("hello");
}

The attach statements are contained in the same se include files as the C counterpart in /usr/include. The man page for fopen, for instance, specifies that the file stdio.h should be included to obtain its declaration. In SymbEL, the include file stdio.se is included to obtain the declaration inside an attach block.

Here are some rules governing the use of attached functions:

  • Only parameters that are four bytes long or less can be passed as parameters. No longlong types or doubles.

  • Structures can be passed, but they are sent as pointers to structures.

    The equivalent C representation of the SymbEL structure can be declared, and the parameters should then be declared as pointers to that type. The structure pointer parameter can then be used as it normally would be used. And although the structure parameter in the C code is a pointer, it still is not a reference parameter and any changes made will not be copied back to SymbEL variable.

  • Attached functions declaring a structure type as their return value are treated as if the function returns a pointer to that type.

    There is no way to declare an attached function that returns a structure, i.e., not a pointer to a structure, but a structure. The value returned is converted from the C representation into the internal SymbEL representation. No additional code is needed to convert the return value.

    Note that attached functions returning pointers to structures sometimes return zero (a null pointer) to indicate error or end-of-file conditions. Such functions should be declared as returning ulong, and their return value compared to zero. If a non-zero value is returned, the struct_fill built-in can be used to fill a structure. If an attached function is declared to return a structure and it returns zero, then a null pointer exception occurs in the program and the interpreter exits.

  • No more than 12 parameters can be passed.

  • Arrays passed to attached functions are passed by reference. The call

    fgets(buf, sizeof(buf), stdin);

    does exactly what it is expected to do. The buf parameter will be filled in by fgets directly because the internal representation of an array of characters is, not surprisingly, an array of characters. These semantics include passing arrays of structures. The SymbEL structure will be emptied before passing and filled upon return when sent to attached functions.

  • The rules for finding the shared library in an attach statement are the same as those defined in the man page for ld.

Ellipsis Parameter

For attached functions only, you can use the ellipsis parameter ( ...) to specify that there are an indeterminate number and type of parameters to follow. Values passed up until the ellipsis argument are type checked, but everything after that is not type checked. The ellipsis parameter allows functions like sscanf to work and therefore makes the language more flexible. For instance, the program

attach "libc.so" {
int sscanf(string buf, string format, ...);
};
main()
{
string buf_to_parse = "hello 1:15.16 f";
char str[32];
int n;
int i;
double d;
char c;
n = sscanf(buf_to_parse, "%s %d:%lf %c", str, &i, &d, &c);
printf("Found %d values: %s %d:%5.2lf %c\n", n, str, i, d, c);
}

yields the output

"Found 4 values: hello 1:15.16 f"

Attached Variables

Global variables contained in shared objects can be declared within an attach block with the keyword extern before the declaration. This declaration causes the values within the internal SymbEL variable to read and be written to the variable as it is used in the execution of the program.

Here is an example of the declaration of getopt, with its global variables optind, opterr, and optarg from the include file stdlib.se.

attach "libc.so" {
int getopt(int argc, string argv[], string optstring);
extern int optind;
extern int opterr;
extern string optarg;
};

This code works for all types, including structures.

Built-in Variables

Although extern variables can be attached with the extern notation, there are three very special cases of variables that cannot be attached this way. These variables are stdin, stdout, and stderr. These “variables” in C are actually #define directives in the stdio.h include file; they reference the addresses of structure members. Since the address of structures cannot be taken in SymbEL, there is no way to represent these so-called variables. They are, therefore, provided by the interpreter as built-in variables. They can be used without any declaration or include file usage.

Parameters to main and Its Return Value

In C programs, the programmer can declare main as accepting three parameters:

  • An argument count (usually argc)

  • An argument vector (usually argv)

  • An environment vector (usually envp)

Similarly, the SymbEL main function can be declared as accepting two of these parameters, argc and argv. Here is an example that uses these variables.

main(int argc, string argv[]) 
{
int i;
for(i=0; i<argc; i++) {
printf("argv[%d] = %s\n", i, argv[i]);
}
}

This example also demonstrates the use of an empty array declaration. When this program is run with the command

se test.se one two three four five six

the resulting output is

argv[0] = test.se 
argv[1] = one
argv[2] = two
argv[3] = three
argv[4] = four
argv[5] = five
argv[6] = six

It is not necessary to declare these parameters to main. If they are not declared, then the interpreter does not send any values for them.

It is also possible to declare main as being an integer function. Although the exit function can be used to exit the application with a specific code, the value can also be returned from main. In this case, the previous example would be:

int main(int argc, string argv[]) 
{
int i;
for(i=0; i<argc; i++) {
printf("argv[%d] = %s\n", i, argv[i]);
}
return 0;
}

The value returned by the return statement is the code that the interpreter exits with.

Structures

SymbEL supports the aggregate type struct, which is similar to the C variety, with some exceptions. An aggregate is a collection of potentially dissimilar objects collected into a single group. As it turns out, most of the SymbEL code developed will contain structures.

As an example, here is what a SymbEL password file entry might look like.

struct passwd {
string pw_name;
string pw_passwd;
long pw_uid;
long pw_gid;
string pw_age;
string pw_comment;
string pw_gecos;
string pw_dir;
string pw_shell;
};

The declaration of structure variables differs from C in that the word struct is left out of the variable declaration. So, to declare a variable of type struct passwd, only passwd would be used.

Accessing Structure Members

You access a structure member with dot notation. The first part of the variable is the variable name itself, followed by a dot and then the structure member in question. To access the pw_name member of the passwd structure above, the code could look like this.

main() 
{
passwd pwd;
pwd.pw_name = "richp";
...
}

Structure members can be any type, including other structures. A structure may not contain a member of its own type. If it does, the parser posts an error.

Arrays of Structures

Declarations of arrays of structures is the same as for any other type, with the provision stated in the previous paragraph. Notation for accessing members of an array of structures is name[expression].member .

Structure Assignment

The assignment operation is available to variables of the same structure type.

Structure Comparison

Comparison of variables of structure type is not supported.

Structures as Parameters

Variables of structure type can be passed as parameters. As with other parameters, they are passed by value so the target function can access its structure parameter as a local variable.

Arrays of structures to other SymbEL functions are also passed by value. This is not the case with passing arrays of structures to attached functions (see “Attachable Functions” on page 532).

Structures as Return Values of Functions

Functions can return structure values. Assigning a variable the value of the result of a function call that returns a structure is the same as a structure assignment between two variables.

Language Classes

The preceding sections have discussed the basic structure of SymbEL. The remainder of this chapter discuss the features that make SymbEL powerful as a language for extracting, analyzing, and manipulating data from the kernel.

When generalizing a capability, the next step after creation of a library is the development of a syntactic notation which represents the capability that the library provided. The capability in question here is the retrieval of data from the sources within the kernel that provide performance tuning data. SymbEL provides a solution to this problem through the use of predefined language classes that can be used to declare the type of a variable and to designate it as being a special variable. When a variable with this special designation is accessed, the data from the source that the variable represents is extracted and placed into the variable before it is evaluated.

There are four predefined language classes in SymbEL:

  • kvm — Access to any global kernel symbol

  • kstat — Access to any information provided by the kstat framework

  • mib — Read-only access to the MIB2 variables in the IP, ICMP, TCP, and UDP modules in the kernel

  • ndd — Access to variables provided by the IP, ICMP, TCP, UDP, and ARP modules in the kernel

Variables of these language classes have the same structure as any other variable. They can be a simple type or a structured type. What needs clarification in the declaration of the variable is

  • Whether the variable type is simple or structured

  • Whether the variable has a predefined language class attribute

The syntax selected for this capability defines the variable with a name that is the concatenation of the language class name and a dollar sign ( $). This convention allows these prefixes for variables to denote their special status.

kvm$ kvm language class
kstat$ kstat language class
mib$ mib language class
ndd$ ndd language class

Examples of variables declared with a special attribute are:

ks_system_misc kstat$misc;    // structured type, kstat language class 
int kvm$maxusers; // simple type, kvm language class
mib2_ip_t mib$ip; // structured type, mib language class
ndd_tcp_t ndd$tcp; // structured type, ndd language class

When any of these variables appear in a statement, the values that the variables represent are retrieved from the respective source before the variable is evaluated. Variables declared of the same type but not possessing the special prefix are not evaluated in the same manner. For instance, the variable

ks_system_misc tmp_misc;  // structured type, no language class specified

can be accessed without any data being read from the kstat framework.

Variables that use a language class prefix in their name are called active variables. Those that do not are called inactive variables.

The kvm Language Class

Let’s look at an example of the use of a kvm variable.

main() 
{
int kvm$maxusers;
printf("maxusers is set to %d\n", kvm$maxusers);
}

In this example, there is a local variable of type int. The fact that it is an int is not exceptional. The fact that the name of the variable begins with kvm$ is exceptional. It is the kvm$ prefix that flags the interpreter to look up this value in the kernel via the kvm library. The actual name of the kernel variable is whatever follows the kvm$ prefix. The program need not take special action to read the value from the kernel. Simply accessing the variable by using it as a parameter to the printf() statement (in this example) causes the interpreter to read the value from the kernel and place it in the variable before sending the value to printf(). Use of kvm variables is somewhat limiting since the effective uid of se must be superuser or the effective gid must be sys in order to successfully use the kvm library.

In this example, the variable maxusers is a valid variable in the kernel and when accessed is read from the kernel address space. It is possible and legal to declare a kvm$ active variable with the name of a variable that is not in the kernel address space. The value will contain the original initialized value, and refreshing of this type of variable is futile because there is no actual value in the kernel. This technique is useful when dealing with pointers, though, and an example is included in “Using kvm Variables and Functions” on page 553.

The kstat Language Class

The use of kstat variables differs from the use of kvm variables in that all of the kstat types are defined in the header file kstat.se. All kstat variables must be structures because this is how they are defined in the header file. Declaration of an active kstat variable that is not a structure results in a semantic error. Declaration of an active kstat variable that is not of a type declared in the kstat.se header file results in the variable always containing zeros unless the program manually places something else in the variable. Here is an example of using kstat variables.

#include <kstat.se> 
main()
{
ks_system_misc kstat$misc;
printf("This machine has %u CPU(s) in it.\n", kstat$misc.ncpus);
}

Just as in the kvm example, no explicit access need be done to retrieve the data from the kstat framework. The access to the member of the active ks_system_misc variable in the parameter list of printf() causes the member to be updated by the run time.

Multiple Instances

The kstat.se header file contains many structures that have information that is unique in nature. The ks_system_misc structure is an example.

The number of CPUs on the system is unique and does not change depending on something else. However, the activity of each of the individual CPUs does change, depending on which CPU is in question. This is also the case for network interfaces and disks. This situation is handled by the addition to structures of two members that contain data for devices that have multiple instances. These members are name$ and number$.

The name$ member contains the name of the device as supplied by kstat. The number$ member is a linear number representing the nth device of this type encountered. It is not the device instance number. This representation allows a for loop to be written such that all of the devices of a particular type can be traversed without the need to skip over instances that are not in the system. It is not unusual, for instance, for a multiprocessor machine to contain CPUs that do not have linear instance numbers. When traversing through all the devices, the program will encounter the end of the list when the number$ member contains a -1. Here is an example of searching through multiple disk instances.

#include <kstat.se> 
main()
{
ks_disks kstat$disk;
printf("Disks currently seen by the system:\n");
for(kstat$disk.number$=0; kstat$disk.number$ != -1; kstat$disk.number$++)
{
printf("\t%s\n", kstat$disk.name$);
}
}


In this program, kstat$disk.number$ is set initially to zero. The “while part” of the loop is then run, checking the value of kstat$disk.number$ to see if it’s -1. That comparison causes the run time to verify that there is an nth disk. If there is, then the number$ member is left with its value and the body of the loop runs. When the run time evaluates the kstat$disk.name$ value in the printf() statement, it reads the name of the nth disk and places it in the name$ member, which is then sent to printf().

Other Points About kstat

Here are some points about how to best use kstat variables in a program.

Some of the values contained in the kstat structures are not immediately useful by themselves. For instance, the cpu member of the ks_cpu_sysinfo structure is an array of four unsigned longs representing the number of clock ticks that have occurred since system boot in each of the four CPU states: idle, user, kernel, and wait. This data must be disseminated to be useful.

If a program needs to access many members of a kstat variable, then it is in the best interest of the performance of the program and the system to copy the values into an inactive kstat variable by using a structure assignment. The single structure assignment causes all of the members of the structure to be read from the kstat framework with one read and then copied to the inactive variable. When these values are accessed by the inactive variable, no more reads from the kstat framework will be initiated. The net result is a reduction in the number of system calls being performed by the run time, and therefore se does not have a significant impact on the performance of the system. Here is an example.

Example kstat Program
#include <unistd.se> 
#include <sysdepend.se>
#include <kstat.se>

main()
{
ks_cpu_sysinfo kstat$cpusys; // active kstat variable
ks_cpu_sysinfo tmp_cpusys; // inactive kstat variable
ks_system_misc kstat$misc; // active kstat variable
int ncpus = kstat$misc.ncpus; // grab it and save it
int old_ints[MAX_CPU];
int old_cs[MAX_CPU];
int ints;
int cs;
int i;

// initialize the old values
for(i=0; i<ncpus; i++) {
kstat$cpusys.number$ = i; // does not cause an update
tmp_cpusys = kstat$cpusys; // struct assignment, update performed
old_ints[i] = tmp_cpusys.intr; // no update, inactive variable
old_cs[i] = tmp_cpusys.pswitch; // no update, inactive variable
}
for(;;) {
sleep(1);
for(i=0; i<ncpus; i++) {
kstat$cpusys.number$ = i; // does not cause an update
tmp_cpusys = kstat$cpusys; // struct assignment, update performed
ints = tmp_cpusys.intr - old_ints[i];
cs = tmp_cpusys.pswitch - old_cs[i];

printf("CPU: %d cs/sec = %d int/sec = %d\n", i, cs, ints);

old_ints[i] = tmp_cpusys.intr;
old_cs[i] = tmp_cpusys.pswitch; // save old values
}
}
}


About the Program
ks_cpu_sysinfo kstat$cpusys;   // active kstat variable 
ks_cpu_sysinfo tmp_cpusys; // inactive kstat variable

This code is the declaration of the active and inactive variable. Use of the active variable causes the run time to read the values from the kstat framework for the ks_cpu_sysinfo structure. Later accesses to the inactive variable do not cause the reads to occur.

ks_system_misc kstat$misc;     // active kstat variable 
int ncpus = kstat$misc.ncpus; // grab it and save it

Since the ncpus variable will be used extensively, it is best to put the value into a variable that does not cause continual updates.

int old_ints[MAX_CPU]; 
int old_cs[MAX_CPU];

Since the program computes the rate at which interrupts and context switches are occurring, the values from the previous iteration need to be saved so they can be subtracted from the values of the current iteration. They are arrays bounded by the maximum number of CPUs available on a system.

// initialize the old values 
for(i=0; i<ncpus; i++) {
kstat$cpusys.number$ = i; // does not cause an update
tmp_cpusys = kstat$cpusys; // struct assignment, update performed
old_ints[i] = tmp_cpusys.intr; // no update, inactive variable
old_cs[i] = tmp_cpusys.pswitch; // no update, inactive variable
}

This code grabs the initial values that will be subtracted from the current values after the first sleep() is completed. For simplicity, no timers are kept, and it is assumed that only one second has elapsed between updates. In practice, the elapsed time would be computed.

for(i=0; i<ncpus; i++) {
kstat$cpusys.number$ = i; // does not cause an update
tmp_cpusys = kstat$cpusys; // struct assignment, update performed

Here, the number$ member is set to the CPU in question, and then the contents of the entire active structure variable are copied into the inactive structure variable. This coding causes only one system call to update the kstat variable.

ints = tmp_cpusys.intr - old_ints[i]; 
cs = tmp_cpusys.pswitch - old_cs[i];

printf("CPU: %d cs/sec = %d int/sec = %d\n", i, cs, ints);

old_ints[i] = tmp_cpusys.intr;
old_cs[i] = tmp_cpusys.pswitch; // save old values

This code computes the number of interrupts and context switches for the previous second and prints it out. The current values are then saved as the old values, and the loop continues.

Runtime Declaration of kstat Structures

The kstat framework is dynamic and contains information regarding devices attached to the system. These devices are built by Sun and by third-party manufacturers. The interpreter contains static definitions of many devices, and these definitions are mirrored by the kstat.se include file. However, it is unreasonable to assume that the interpreter will always contain all of the possible definitions for devices. To accommodate this situation, a syntactic element was needed. This is the kstat structure.

A kstat structure can define only KSTAT_TYPE_NAMED structures, which are the structures that define devices such as network interfaces. As an example, the following script prints out the values of a kstat structure that is not declared in the kstat.se file but has been part of the kstat framework since the very beginning.

kstat struct "kstat_types" ks_types {
ulong raw;
ulong "name=value";
ulong interrupt;
ulong "i/o";
ulong event_timer;
};
main()
{
ks_types kstat$t;
ks_types tmp = kstat$t;
printf("raw = %d\n", tmp.raw);
printf("name=value = %d\n", tmp.name_value);
printf("interrupt = %d\n", tmp.interrupt);
printf("i/o = %d\n", tmp.i_o);
printf("event_timer = %d\n", tmp.event_timer);
}

The kstat structure introduces a few new concepts:

  • The structure starts with the word " kstat" to denote its significance.

  • The structure also contains members that are quoted. Quoted members work only for kstat structures and do not work in an ordinary structure declaration. Quoted members enable programmers to declare variables that accurately reflect the name of the member within the kstat framework. For instance, the member "name=value" could not be declared without quotes since the parser would generate errors. When accessed in the printf() statement, special characters are translated to underscores. This is the case for any character that is recognized as a token and also for spaces. The characters that will be translated to underscores are:

    []{}()@|!#;:.,+*/=-><~%? \t\n\\^
    []{}()@|!#;:.,+*/=-><~%? \t\n\\^
  • Members of KSTAT_TYPE_NAMED structures sometimes have no name. This situation will also be correctly handled by the interpreter. Any member of a structure with the name "" is changed to missing N where N starts at 1 and increments for each occurrence of a missing member name. A declaration of

    kstat struct "asleep" ks_zzzz {
    ulong ""; // translates into missing1
    };
    kstat struct "asleep" ks_zzzz {
    ulong ""; // translates into missing1
    };

    translates into

    kstat struct "asleep" ks_zzzz {
    ulong missing1;
    };
    kstat struct "asleep" ks_zzzz {
    ulong missing1;
    };

    for the purposes of the programmer. It is a good idea to document such declarations, as shown above.

  • Members with reserved words as names are also munged into another form—the prefix SYM_ is added to the name. For instance, this declaration

    kstat struct "unnecessary" ks_complexity {
    short "short";
    };
    kstat struct "unnecessary" ks_complexity {
    short "short";
    };

    is munged into

    kstat struct "unnecessary" ks_complexity {
    short SYM_short;
    };
    kstat struct "unnecessary" ks_complexity {
    short SYM_short;
    };

    so you can continue.

  • The quoted string following the keyword struct in the declaration represents the name of the KSTAT_TYPE_NAMED structure in the kstat framework and is an algebra unto itself. First, an introduction.

    Each “link” in the kstat “chain” that composes the framework has three name elements: a module, an instance number, and a name. The "kstat_types" link, for instance, has the complete name "unix".0."kstat_types". "unix" is the module, 0 is the instance number, and "kstat_types" is the name. Here are the possible ways to specify the kstat name within this quoted string.

    • "kstat_types" — The “name” of the kstat.

    • "cpu_info:" — The “module” of the kstat. A link with the full name of "cpu_info".0."cpu_info0" would map onto this structure. However, so too would "cpu_info".1."cpu_info1", and this case brings up an issue. When a kstat structure is declared with a kstat module name, the first two members of the structure must be:

      long number$; 
      string name$;
      long number$; 
      string name$;

      This requirement is in keeping with other kstat declarations with multiple instances. In the case of structures with multiple module names that have the same structure members, the list of names continues with colon separators, for example:

      kstat struct "ieef:el:elx:pcelx" ks_elx_network { ...
      kstat struct "ieef:el:elx:pcelx" ks_elx_network { ...
    • "*kmem_magazine" — The prefix of the name portion of the kstat. In the case of the kmem_magazines, the module name is always "unix", which is the module name of many other links that do not share the same structure members as the kmem_magazines. As is the case with specifying a module name, the number$ and name$ members must be present.

Note that when a dynamic kstat structure declaration replaces a static declaration inside of the interpreter, the old declaration is discarded and replaced with the new one. Therefore, if a kmem_magazine declaration were used to replace the "ks_cache" declaration from kstat.se, the only kstat links seen would be the kmem_magazine members and all of the other cache links (and there are a lot of them) would no longer be seen.

Adding New Disk Names

You can use an internal function, se_add_disk_name( string name ), to add new disk names to the existing list internally. Therefore, if the tape drives and nfs mounts that are recorded in the KSTAT_TYPE_IO section of the kstat framework were to be added to the list of disks for display by any script that shows disk statistics, you could add these lines at the beginning of the script.

se_add_disk_name("st"); 
se_add_disk_name("nfs");

This function is declared in the se.se include file.

The mib Language Class

A lot of data regarding the network resides in the mib variables of the kernel. Unfortunately, these mib variables are not part of the kstat framework. Therefore, a new language class was created to facilitate access to this information.

Variables of the mib class have a unique feature in that they can be read, but assigning values generates a warning from the interpreter. This warning is to remind you that assigning values to the members of the mib2_* structures will not result in the information being placed back into the kernel. The mib variables are read-only.

mib variables do not have the permissions limitation of kvm variables. Any user can view mib variable values without special access permissions.

To view the mib information available from within SymbEL, run the command netstat -s from the command line. All but the IGMP information is available.

Since all mib variables are structures, the rules regarding structure assignment being used to cut down on the overhead of the interpreter are the same as for the kstat and kvm classes. Here is an example of using mib class variables.

#include <mib.se> 
main()
{
mib2_tcp_t mib$tcp;
printf("Retransmitted TCP segments = %u\n", mib$tcp.tcpRetransSegs);
}

The ndd Language Class

SunOS 5.x provides access to variables that define the operation of the network stack through a command called ndd (see ndd(1M)). The ndd language class within SymbEL provides access to the variables within the IP, ICMP, TCP, UDP, and ARP modules. The definitions of the available variables are in the ndd.se include file. For each module, there is a structure that contains all of the variables available for that module.

Some of these variables are read-write and others are read-only. If you try to modify a variable that is read-only, the interpreter posts a warning message. Some of the read-only variables are tables that can be quite large. Note that the largest table size that can be handled is 64 kilobytes (65,536 bytes). If an ndd variable is larger than 64 kilobytes, it is truncated.

Like kstat and mib variables, all ndd variables are structures.

The following program displays the tcp_status variable of the TCP module. This variable is type string and when printed looks like a large table.

#include <stdio.se> 
#include <ndd.se>
main()
{
ndd_tcp_t ndd$tcp;
puts(ndd$tcp.tcp_status);
}

User-Defined Classes

The four language classes provide a significant amount of data to a program for analysis. But the analysis of this data can become convoluted and make the program difficult to deal with. This is one of the problems that SymbEL hoped to clear up. Adding more language classes is a potential solution to this problem.

An example of an additional language class that would be useful is a vmstat class. This would be a structure that provided all of the information that the vmstat program provides. The problem is that such an addition would make se larger and provide functionality that didn't really require the internals of the interpreter to accomplish. All of what vmstat does can be done by a SymbEL program.

In addition to the vmstat class, it would be useful to have classes for iostat, mpstat, nfsstat, netstat, and any other “stat” program that provided this type of statistical information. What was needed to accomplish this task correctly was a language feature that allowed programmers to create their own language classes in SymbEL. This “user defined class” would be a structure and an associated block of code that was called whenever one of the members of the structure was accessed. This idea led to the development of the aggregate type class.

A class type is a structure and a block of code inside the structure that are first called when the block that contains the declaration of the class variable is entered. Thereafter, whenever a member of the class variable is accessed, the block is called. To illustrate the class construct, here is a program that continually displays how long a system has been up. The first example is without the use of a class.

#include <stdio.se> 
#include <unistd.se>
#include <kstat.se>

#define MINUTES (60 * hz)
#define HOURS (60 * MINUTES)
#define DAYS (24 * HOURS)

main()
{
ulong ticks;
ulong days;
ulong hours;
ulong minutes;
ulong seconds;
ks_system_misc kstat$misc;
long hz = sysconf(_SC_CLK_TCK);

for(;;) {
ticks = kstat$misc.clk_intr;
days = ticks / DAYS;
ticks -= (days * DAYS);
hours = ticks / HOURS;
ticks -= (hours * HOURS);
minutes = ticks / MINUTES;
ticks -= (minutes * MINUTES);
seconds = ticks / hz;
printf("System up for: %4u days %2u hours %2u minutes %2u seconds\r",
days, hours, minutes, seconds);
fflush(stdout);
sleep(1);
}
}


This program continues in an infinite for loop, computing the uptime based on the number of clock ticks the system has received since boot. The computation is contained completely within the main program. This code can be distilled into a user-defined class, as the following code shows.

#include <unistd.se> 
#include <kstat.se>

#define MINUTES (60 * hz)
#define HOURS (60 * MINUTES)
#define DAYS (24 * HOURS)

class uptime {

ulong ticks;
ulong days;
ulong hours;
ulong minutes;
ulong seconds;

uptime$()
{
ks_system_misc kstat$misc;
long hz = sysconf(_SC_CLK_TCK);

ticks = kstat$misc.clk_intr; /* assign these values to the */
days = ticks / DAYS; /* class members */
ticks -= (days * DAYS);
hours = ticks / HOURS;
ticks -= (hours * HOURS);
minutes = ticks / MINUTES;
ticks -= (minutes * MINUTES);
seconds = ticks / hz;
}
};


The start of the class looks like a structure, but the final “member” of the structure is a block of code called the “class block.” The name used after the class keyword is the type name that will be used in the declaration of the variable. The name of the class block is the prefix used in variable names to denote that the variable is active. Variables declared in a user-defined class type that do not use the prefix in the variable name are inactive.

The main() function of the uptime program would now be written to use the uptime class as shown in this example.

#include <stdio.se> 
#include <unistd.se>
#include "uptime_class.se"

main()
{
uptime uptime$value;
uptime tmp_uptime;

for(;;) {
tmp_uptime = uptime$value;
printf("System up for: %4u days %2u hours %2u minutes %2u seconds\r",
tmp_uptime.days, tmp_uptime.hours,
tmp_uptime.minutes, tmp_uptime.seconds);
fflush(stdout);
sleep(1);
}
}

The previous section discussed how the assignment of entire structures cuts down on the overhead of the system because only one copy is required. Not only is this true here as well, but the structure copy also ensures that the data printed out represents the calculations of one snapshot in time, instead of printing different values for each time that the class block was called to update each member of the class that was used as a parameter to printf().

Pitfalls

Here are some of the idiosyncrasies of the language that will catch programmers by surprise if they’re accustomed to using a particular feature in C and assume that it will be supported in SymbEL.

  • Only one variable can be declared per line. The variable names may not be a comma-separated list.

  • There is no type float. All floating-point variables are type double.

  • Curly braces must surround all sequences of statements in control structures, including sequences of length one.

  • The comparators work with scalars, floats, and strings. Therefore, the logical comparison ("hello" == "world") is valid and in this case returns false.

  • If the result of an expression yields a floating value as an operand to the modulus operator, that value is converted to long before the operation takes place. This conversion occurs while the program is running.

  • Assignment of the result of a logical expression is not allowed.

  • The for loop has some limitations.

    • There can be only one assignment in the assignment part.

    • There can be only logical expressions in the while part.

  • There can be only one assignment in the do part.

  • All local variables have static semantics.

  • All parameters are passed by value.

  • Global variables can be assigned the value of a function call. * while(running) is not syntactically correct. while(running != 0) is correct.

  • There is no recursion in SymbEL.

  • Structure comparison is not supported.

  • Syntax of conditional expressions is rigid: ( condition ? do_exp : else_exp )

  • Calling attached functions with incorrect values can result in a core dump and is not avoidable by the interpreter. This simple but effective script will cause a segmentation fault core dump:

#include <stdio.se> 
main()
{
puts(nil);
}

Tricks

As the creator of a programming language and the developer of the interpreter, it is much easier for me to see through the intricacies of the features to underlying functionality of the interpreter itself. This knowledge manifests itself in programming “tricks” that allow certain operations to be done that may not be obvious. Here are some that I’ve used. If there’s something you need done and it doesn’t seem to fit into any language feature, try to work around it. You may find a loophole that you didn’t know existed.

Returning an Array of Nonstructured Type from a Function

Although it is not allowed to declare a function as

int [] 
not_legal()
{
int array[ARRAY_SIZE]={1,2,3,4,5,-1};
return array;
}

it is still possible to return an array. Granted, this code is unattractive, but most of the tricks in this section involve something that is not very appealing from the programming standpoint. SymbEL is, after all, just a scripting language. And if it can be done at all, it’s worth doing. So, here’s how to return an array of nonstructured type from a function.

#define ARRAY_SIZE 128 
ulong
it_is_legal()
{
int array[ARRAY_SIZE]={1,2,3,4,5,-1};
return &array;
}
struct array_struct {
int array[ARRAY_SIZE];
};
main()
{
array_struct digits;
ulong address;
int i;
address = it_is_legal();
struct_fill(digits, address);
for(i=0; digits.array[i] != -1; i++) {
printf("%d\n", digits.array[i]);
}
}

Using Return Values from Attached Functions

It is common to read input lines by using fgets, then locate the newline character with strchr and change it to a null character. This approach has unexpected results in SymbEL. For instance, the code segment

while(fgets(buf, sizeof(buf), stdin)!=nil){
p = strchr(buf, '\n');
p[0] = '\0';
puts(buf);
}

would be expected to null the newline character and print the line (yes, I know this code segment will cause se to exit with a null pointer exception if a line is read with no newline character). But this is not the case because the strchr function will return a string that is assigned to the variable p. When this happens, a new copy of the string returned by strchr is allocated and assigned to p. When the p[0] = '\0'; line is executed, the newline character in the copy is made null. The original buf from the fgets call remains intact. The way around this result (and this workaround should be done only when it is certain that the input lines contain the newline character) is:

while(fgets(buf, sizeof(buf), stdin)!=nil){
strcpy(strchr(buf, '\n'), "");
puts(buf);
}

In this case, the result of the strchr call is never assigned to a variable, and its return value remains uncopied before being sent to the strcpy function. strcpy then copies the string "" onto the newline and in doing so, changes it to the null character.

Using kvm Variables and Functions

Using the kvm functions and dealing with kvm variables in general is quite confusing because there are so many levels of indirection of pointers. This simple script performs the equivalent of /bin/uname -m.

#include <stdio.se> 
#include <devinfo.se>

main()
{
ulong kvm$top_devinfo; // top_devinfo is an actual kernel variable
dev_info_t kvm$root_node; // root_node is not, but it needs to be active

// The next line affects a pointer indirection. The value of top_devinfo
// is a pointer to the root of the devinfo tree in the kernel. This value
// is extracted, and the root_node variable has its kernel address changed
// to this value. Accessing the root_node variable after this assignment
// will cause the reading of the dev_info_t structure from the kernel
// since root_node is an active variable. Note that root_node is not
// a variable in the kernel though, but it's declared active so that
// the value will be read out *after* it's given a valid kernel address.
// And there's no need to explicitly read the string, it's done already.
kvm_cvt(kvm$root_node, kvm$top_devinfo);
puts(kvm$root_node.devi_name);
}


Another example of extracting kvm values is with the kvm_declare function. This function allows kernel variables to be declared while the program is running. Instead of declaring a kvm variable for maxusers, for instance, you could do it this way:

main() 
{
ulong address;
int kvm$integer_value;

address = kvm_declare("maxusers");
kvm_cvt(kvm$integer_value, address);
printf("maxusers is %d\n", kvm$integer_value);
}

A more general way to peruse integer variables entered at the user’s leisure is shown in this example.

#include <stdio.se> 
#include <string.se>

int main()
{
char var_name[BUFSIZ];
ulong address;
int kvm$variable;

for(;;) {
fputs("Enter the name of an integer variable: ", stdout);
if (fgets(var_name, sizeof(var_name), stdin) == nil) {
return 0;
}
strcpy(strchr(var_name, '\n'), ""); // chop
address = kvm_declare(var_name); // look it up with nlist
if (address == 0) {
printf("variable %s is not found in the kernel space\n", var_name);
continue;
}
kvm_cvt(kvm$variable, address); // convert the address of the kvm var
printf("%s = %u\n", var_name, kvm$variable);
}
}


Using an attach Block to Call Interpreter Functions

The attach feature of SymbEL implements the use of the dynamic linking feature of Solaris. The dl functions allow an external library to be attached to a running process, thus making the symbols within that binary available to the program.

One of the features of dynamic linking is the ability to access symbols within the binary that is running. That is, a process can look into itself for symbols. This can also be accomplished in SymbEL by using an attach block with no name. With this trick, a script can call functions contained within the interpreter, but the author of the script has to know what functions are available. Currently, the only functions available to the user are listed in the se.se include file.

The most useful of these functions is the se_function_call function, which allows the script to call a SymbEL function indirectly. This function can be used for a callback mechanism. It’s the equivalent of a pointer to a function. For example, this script calls the function " callback" indirectly.

#include <se.se> 
main()
{
se_function_call("callback", 3, 2, 1);
}

callback(int a, int b, int c)
{
printf("a = %db=%dc=%d\n", a, b, c);
}

The se_function_call function is declared with an ellipsis argument so any number of parameters can be passed (up to the internal limit) to the function being called. Be careful to pass the correct type and number of arguments.

An extreme example of this functionality is demonstrated below. The script calls on the interpreter to parse a function from an external file and then run the function. It’s an absurd example, but it demonstrates the tangled web that can be weaved with attached functions and variables.

// this is the file "other_file" 
some_function(int param)
{
printf("hello there: %d\n", param);
}
// this is the demo script
#include <stdio.se>
#include <string.se>
#include <se.se>

attach "" {
extern ulong Lex_input;
extern int Se_errors;
yyparse();
se_fatal(string p);
};

int main()
{
Lex_input = fopen("other_file", "r");
if (Lex_input == 0) {
perror("fopen");
return 1;pf
}
yyparse();
if (Se_errors != 0) {
se_fatal("parse errors in other_file");
return 1;
}
se_function_call("some_function", 312);
return 0;
}


    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多