Tuesday, December 21, 2010

Gdb Tips #2

Signals

info signals

info handle

Print a table of all the kinds of signal and how Gdb has been told to hanlde each one.

handle signal [keywords...]

Change the way Gdb handles signal signal.

The keywords allowed by handle are:

nostop – Not stop the program when this signal happens

stop - Stop the program when this signal happens

print - Print a message when this signal happens

noprint – Not mention the occurrence of the signal at all

pass/noignore – Allow your program to see this signal and handle it

nopass/ignore – Not allow your program to see this signal


When a signal stops your program, the signal is not visible to the program until you continue.

Your program sees the signal then , if pass is in effect for the signal in question.

So after Gdb reports a signal, you can use the handle command with pass or nopass to control whether your program sees that signal when you continue.

Monday, November 22, 2010

Gdb tips

Hi, as I'm trying to get a deeper understandment about the features included in Gdb, I started to read its manual and taking some key concecpts do post here.

Today we will start with the concept of Stopping and Continuing the program under Gdb.


Stopping and Continuing

The principal purposes of using a debugger are so that you can stop your program before it terminates, and if it runs into trouble, you can investigate and find out why.



Breakpoints

A breakpoint makes your program stop whenever a certain point in the program is reached, and you can add conditions to control in finer detail you program.

You can use the break command to create a breakpoint, a breakpoint needs a place to know when stops, this address can be the line number, a function name or the exact address in the program.



Watchpoint

A watchpoint is a special breakpoint that stops your program when the value of and expression changes, the commands to manage the wachpoints are the same as for breakpoints.



Gdb assigns an integer number to each breakpoint and watchpoint you create.

In many commands to manage them, is only necessary this integer number to choose the specified break or watchpoint.

Some Gdb commands accept a range of breakpoints on which to operate, like “5-7”, will only operate in breakpoint 5 through 7.

The variable $bpnum records the number of the breakpoint you've set most recently.



break location

Set a breakpoint at the given location



break

When called without arguments, break sets a breakpoint at the next instruction to be executed in the selected stack frame.



break … if cond

Set a breakpoint with condition cond. Evaluate the expression cond each time theh breakpoint is reached, and stop if the condition evaluates to true.



tbreak args

Set a breakpoint that will be deleted once reached.



info breakpoint [n] or i b [n]

The info if all breakpoints and watchpoints the [n] argument is optional and prints the info only only this breakpoint.

The columns displayed are:

Num : Integer number representing the breakpoint

Type : Breakpoint or watchpoint

Disp : Disposition, whether the breakpoint is marked to be disabled or enabled when hit

Enb : Enabled breakpoints are marked with “y”. And the not enabled are with “n”

Address : The address of the breakpoint in your program.

What : Where the breakpoint is in the source of your program



If the breakpoint is conditional, info break shows the condition on the line following the affected breakpoint; breakpoint commands, if any, are listed after that.

Also info break shows how many times the breakpoints has been hit.

Gdb allows you to set any number of breakpoints at the same place in your program. There is nothing silly or meaningless about this, when the breakpoints are conditional, this is even useful.





Setting Watchpoints

You can use watchpoint to stop execution whenever the value of an expression changes without having to predict a particular place where this may happen.

You can set a watchpoint on an expression even if the expression can not be evaluated yet.



watch expr

Set a watchpoint for an expression.

The simplest use of this command is to watch the value of a single variable.



rwatch expr

Set a watchpoint that will break when the value of expr is read by the program.



awatch expr

Set a watchpoint that will break when expr is either read from or written into by the program.



info watchpoints

Prints a list of watchpoints, using the same format as the info break.



To set a watchpoint on an fixed address, you must dereference it, as the addres itself is just a constant number which will never change.

(gdb) watch 0x600850

Cannot watch constant value 0x600850

(gdb) watch *(int *) 0x600850

Watchpoint 1: *(int *) 6293584



Deleting Breakpoints


clear

Delete any breakpoint at the next instruction to be executed

clear [ location | function | linenum ]

Delete any breakpoint set at the specified location or function name or line number.

If you have multiple files, you can specify the filename before the locations above.


delete [breakpoints] [range]

Delete the breakpoints and watchpoints of the breakpoint number, or the range specified.


Disabling Breakpoints


Sometimes we might prefer to disable some break or watchpoints rather do delete it.

The commands used to enable or disable the breakpoints and watchpoints must be used with their respective integer number, that identifies them, you can see it using info breakpoints.


A break or watchpoint can have any of four different states of enablement:

enabled : The breakpoint stops your program, is the default set with the break command

disabled: The breakpoint has no effect.

enabled once: The breakpoint stops your program, but then becomes disabled

enabled for deletion: The breakpoint stops your program but is deleted after it. As tbreak does

The commands to enable or disable breakpoints and watchpoints.

disable [breakpoints] [range] or dis

Disable the specified breakpoints, or all if none are listed.

enable [breakpoints] [range]

Enable the specified breakpoints.

enable [breakpoints] once range...

Enable the specified breakpoints temporarily for one time, then Gdb disables it.


enable [breakpoints] delete range...

Enable the specified breakpoints to work once, then Gdb deletes it.



Break Conditions

The simplest sort of breakpoint break every time your program reaches a specified place.

You can also specify a condition for a breakpoint.

A condition is just a Boolean expression in your programming language that Gdb evaluates each time your program reaches it and stop only if the conditions is true.


Break conditions can be specified when a brekpoint is set, by using 'if' in the arguments to the break command and they can also be changed at any time with the condition command.


condition bnum expression

Specify expression as the break condition for breakpoint, watchpoint identified by bnum.

After you set a condition, breakpoint bnum stops your program only if the value of expression is true.


condition bnum

Remove the condition from breakpoint number bnum.


A special case of a breakpoint condition is to stop only when the breakpoint has been reached a certain number of times. For this we use:


ignore bnum count


Breakpoint Command Lists

You can give any breakpoint a series of commands to execute when your program stopes due to that breakpoint. You might want to print the values of certain expressions, or enable other breakpoints.


commands [range...]

command-list …

end

Specify a list of commands for the given breakpoint.

Type a line containing just end to terminate the commands.

To remove all commands from a breakpoint, type commands and follow it immediately with end.

With no argument, commands refers to the last breakpoint or watchpoint.

If the first command you specify in a command list is silent, the usual message about stopping at a breakpoint is not printed.

If none of the remaining commands print anythiing, you see no sign that the breakpoint was reached, silent is meaningful only at the beginning of a breakpoint command list.

The commands echo, output, and printf allow you to print precisely controlled output, and are often useful in silent breakpoints, an example:

break foo if x>0

commands

silent

printf “x is %d\n”, x

cont

end


One application for breakpoint commands is to compensate for one bug so you can test for another.

Put a breakpoint just after the erroneous line of code, give it a condition to detect the case in which something erroneous has been done, and give it commands to assign correct values to any variables that need them. End with the continue commands so that your program does not stop, and start with the silent commands so that no output is produced. Here is an example:

break 403

commands

silent

set x = y + 4

cont

end

Thursday, October 21, 2010

Found a new vulnerability in the Linux Kernel 2.30 and above

In 10/19/2010 Dan Rosenberg from VSR found a local privilege escalation vulnerability in RDS protocol as implemented in the Linux Kernel.

Because kernel functions responsible for copying data between kernel and user space failed to verify that a user-provided address actually resided in the user segment, a local attacker could issue specially crafted socket function calls to write abritrary values into kernel memory. By leveraging this capability, it is possible for unprivileged users to escalate privileges to root.

This vulnerability affects unpatched versions of the Linux kernel, starting from 2.6.30, where the RDS protocol was first included. Installations are only vulnerable if the CONFIG_RDS kernel configuration option is set, and if there are no restrictions on unprivileged users loading packet family modules, as is the case on most stock distributions.

Kernel versions affected

As the RDS protocol was included in the kernel for the first time in 2.6.30, every version starting from it and with CONFIG_RDS option set is vulnerable.

Fixing the problem

The distributions already have updates on the kernel that you may use.
If you are using Debian, try to verify your "Upgradeable Packages" in aptitude and then go into "kernel" menu, update all packages in this menu and then y
ou are safe.

You can patch your kernel in the raw way using this link: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=799c10559d60f159ab2232203f222f18fa3c4a5f

You can prevent the RDS kernel module from loading too, as root do:


echo "alias net-pf-21 off" > /etc/modprobe.d/disable-rds

The exploit is already public and you can find it on VSR Security Site.

Reference

http://www.vsecurity.com/resources/advisory/20101019-1/

Tuesday, December 15, 2009

A Study on Stack

First of all, I'm not a hacker on stack, I'm learning Assembly and I fall in this subject.

As this subject charmed me, I wanted to be in depth understanding of what is this stack stuff and how it works.

The major part of the collected material was from wikipedia, the Guide to Assembly Programming in Linux, and the #asm channel in Freenode.net.

I hope you enjoy the tour ;)



So what is a Stack?

Conceptually, a stack is a last-in-first-out (LIFO) data structure1. The operation of a stack is analogous to the stack of trays you find in cafeterias. The first tray removed from the stack of trays would be the last tray that had been placed on the stack. There are two operations associated with a stack: insertion and deletion. If we view the stack as a linear array of elements, stack insertion and deletion operation are restricted to one end of the array. Thus, the only elements that is directly accessible is the element at the top-of-stack (TOS). In stack terminology, insert and delete operation are referred to as push and pop operations, respectively.


Lets assume that we are inserting number 10, 11, 12, 13 into a stack in ascending order. The state of the stack can be visualized below. The arrow points to the top-of-stack. When the numbers are deleted from the stack, the numbers will come out in the reverse order of insertion. That is, 13 is removed first, and so on. After the deletion of the last number, the stack is said to be in the empty state.


The deletion of data items from the stack, would be like this:


Implementation of the Stack

The memory space reserved in the stack segment is used to implement the stack. The register SS and ESP are used to implement the stack. The top-of-stack, which points to the last item inserted into the stack, is indicated by SS:ESP.

The key stack implementation characteristics are as follows:

  • Only words (16-bit data) or doublewords (32-bit data) are saved on the stack, never a single byte.

  • The stack grows toward lower memory addresses. Since we graphically represent memory with addresses increasing from the bottom of a page to the top, we say that the stack grows downward.

  • Top-of-stack (TOS) always points to the last data item placed on the stack. The TOS always points to the lower byte of the last word pushed onto the stack. For example, when we push 0xab78 onto the stack, the TOS points to 0x78 byte.

When a word is pushed onto the stack, ESP is first decremented by two, and then the word is stores at SS:ESP. Since the IA-32 processors use the little-endian2 byte order, the higher-order byte is stored in the higher memory address. For instance, when we push 0xab78, the stack expands by two bytes and ESP is decremented by two to point to the last data item.

The stack full condition is indicated by the zero offset value (ESP=0). If we try to insert a data item into a full stack, stack overflow occurs.

Retrieving a 32-bit data item from the stack causes the offset value to increase by four to point to the next data item on the stack.


Stack Operations

Basic Instructions

The stack data structure allows two basic operations: insertion of a data item into the stack (called the push operation) and deletion (called the pop operation). These two operations are allowed on word or doubleword data items. The syntax is:



push source

pop destination



The operand of these two instructions can be a 16- or 32-bit general purpose register, segment register, or a word or doubleword in memory. In addition, source for the push instruction can be an immediate operand of size 8,16 or 32 bits.

On an empty stack , the statements:



push 45b1H

push ab21cc45H



would result in the stack growing down 6 bytes and the ESP pointing to the last byte of the hex number ab21cc45H, that is 45.

Now, if we execute:


pop EBX



on this stack, would result in the EBX register taking the value ab21cc45.



Additional Instructions

Stack Operations on Flags

The push and pop operations cannot be used to save or restore the flags register. For this, two special versions of these instructions are provided:



pushfd (push 32-bit flags)

popfd (pop 32-bit flags)



These instructions do not need any operands. For operating on the 16-bit flags register (FLAGS), we can use pushfw and popfw instructions. If we use pushf the default operand size selects either pushfd or pushfw. In our programs, since our default is 32-bit operands, pushf is used as an alias for pushfd. However, e use pushfd to make the operand size explicit. Similarly, popf can be used as an alias for either popfd or popfw.



Stack Operations on All General-Purpose Registers

The instruction set also has special pusha and popa instructions to save and restore the eight general-purpose registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI. These registers are pushed in the order specified.



Uses of the Stack

The stack is used for three main purposes: a scratchpad to temporarily store data, for transfer program control, and for passing parameters during a procedure call.



Temporary Storage of Data

The stack can be used to store data on a temporary basis. For example, consider exchanging the contents of two 32-bit variables that are in the memory: value1 and value2.

We cannot use:


xchg value1, value2


because both operands of xchg are in the memory. The code


mov EAX, value1

mov EBC, value2

mov value1, EBX

mov value2, EAX


works, but it uses two 32-bit registers. The code requires four memory operations. However, due to the limited number of general-purpose registers, finding spare registers that can be used for temporary storage is nearly impossible in almost all programs.

What if we need to preserve the contents of the EAX and EBX registers? In this case, we need to save these registers before using them and restore them later as shown below:


;save EAX and EBX registers on the stack

push EAX

push EBX

;EAX and EBX register can now be used

mov EAX, value1

mov EBX, value2

mov value1, EBX

mov value2, EAX

;restore EAX and EBX registers from the stack

pop EBX

pop EAX



This code requires eight memory accesses. Because the stack is a LIFO data structure, the sequence of pop instructions is a mirror image of the push instruction sequence.

An elegant way of exchanging the two values is:


push value1

push value2

pop value1

pop value2


Notice that the above code does not use any general-purpose registers and requires eight memory operations as in the other example. Another point to note is that push and pop instructions allow movement of data from memory to memory (between data and stack segments, for example). This is a special case because mov instructions do not allow memory-to-memory data transfer.


Stack is frequently used as a scratchpad to save and restore registers. The necessity often arises when we need to free up a set of register so they can be used by the current code. This is often the case with procedures.

It should be clear from these examples that the stack grows and shrinks during the course of a program execution. It is important to allocate enough storage space for the stack, as stack overflow and underflow could cause unpredictable results, often causing system errors.


Transfer of Control

The previous discussion concentrated on how we, as programmers, can use the stack to store data temporarily. The stack is also used by some instructions to store data temporarily. In particular, when a procedure is called, the return address of the instruction is stored on the stack so that the control can be transferred back to the calling program.


Parameter Passing

Another important use of the stack is to act as a medium to pass parameters to the called procedure.



Procedure Instructions

To call a procedure, we use the following statement:



call proc-name



The assembler replaces proc-name by the offset value of the first instruction of the called procedure.


How Is Program Control Transferred?

The offset provided in the call instruction is not the absolute value (i.e., offset is not relative to the start of the code segment pointed to by the CS register), but a relative displacement in bytes from the instruction following the call instruction.


The EIP register points to the next instruction to be executed. This instruction should be executed after returning from this procedure. The processor makes a note of this by pushing the contents of the EIP register onto the stack.


The following is a summary of the actions taken during a procedure call:


ESP = ESP – 4 ;push return address onto the stack

SS:ESP = EIP

EIP = EIP + relative displacement ;update EIP to point to the procedure



The ret Instruction

The ret (return) instruction is used to transfer control from the called procedure to the calling procedure. How will the processor know where this instruction is loaded??

Remember that the processor made a note of this when the call instruction was execute. When the ret instruction is executed, the return address from the stack is recovered.

So:



EIP = SS:ESP ;pop return address at TOS into IP

ESP = ESP + 4 ;update TOS by adding 4 to ESP



Parameter Passing

In the assembly language, the calling procedure first places all the parameters needed by the called procedure in a mutually accessible storage area (usually registers or memory). Only then can the procedure be invoked. There are two common methods depending on the type of storage area used to pass parameters: register method or stack method.


Register Method

The advantages are:

  1. The register method is convenient and easier for passing a small number of arguments.

  2. This method is also aster because all the arguments are available in registers.


Disadvantages:

  1. There is a limited number of general-purpose registers in the CPU.

  2. Thus, is necessary to temporarily save the contents of these registers on the stack to free them for use in parameter passing before calling a procedure, and restore them after retuning from the called procedure.



Stack Method

In this method of parameter passing, all arguments required by a procedure are pushed onto the stack before the procedure is called. As an example, let us consider passing the two parameters required by a procedure. This can be done by:



push number1

push number2

call procedure



After executing the call instruction, which automatically pushes the EIP contents onto the stack the stack state is show in the figure below:


Reading the two arguments – number1 and number2 – is tricky. Since the parameter values are buried inside the stack, first we have to pop the EIP value to read the two arguments. This, for example can be done by


pop EAX ;EIP

pop EBX ;number1

pop ECX ;number2



in the called procedure. Since we have removed the return address (EIP) from the stack, we will have to restore it by

push EAX

so that TOS is pointing to the return address.


The main problem with this code is that we need to set aside general-purpose registers to copy parameter values. This means that the calling procedure cannot use these registers for any other purpose. Worse still, what if you want to pass 10 parameters? One way to free up registers is to copy the parameters from the stack to local data variables, but this is impractical and inefficient.

The best way to get parameter values is to leave them on the stack and read them from the stack as needed. Since the stack is a sequence of memory locations, ESP + 4, points to number2, and ESP + 8 to number1. For instance:


mov EBX, [ESP+4]


can be used to access number2, but this causes a problem. The stack pointer is updated by the push and pop instructions. As a result, the relative offset changes with the stack operations performed in the called procedure. This is not a desirable situation.


There is a better alternative: we can use the EBP registers instead of ESP to specify an offset into the stack segment. For example, we can copy the value of number2 into the EAX register by:


mov EBP,ESP

mov EAX,[EBP+4]


This is the usual way of pointing to the parameters on the stack. Since every procedure uses the EBP register to access parameters, the EBP register should be preserved. Therefore, we should save the contents of the EBP register before executing the


mov EBP,ESP


statement. We, of course, use the stack for this. Note that


push EBP

mov EBP,ESP


causes the parameter displacement to increase by four bytes (the size of EBP).

The information stored in the stack – parameters, return address, and the old EBP value – is collectively called the stack frame.

So before returning from the procedure, we should use


pop EBP


to restore the original value of EBP.

Now the problem is that the four bytes of the stack occupied by the two arguments are no longer useful. One way to free these four bytes is to increment ESP by four after the call statement, as shown below:


push number1

push number2

call procedure

add ESP,4


For example, C compilers use this method to clear parameters from the stack. The above assembly language code segment corresponds to the


procedure(number2, number1);


function call in C.

Rather than adjusting the stack by the calling procedure, the called procedure can also clear the stack. Note that we cannot write


procedure:

add ESP,4

ret


because when ret is executed, ESP should point to the return address on the stack. Te solution lies in the optional operand that can be specified in the ret statement. The format is:


ret optional-value


which results in the following sequence of actions:


EIP = SS:ESP

ESP = ESP + 4 + optional-value


The optional-value should be a number. Since the purpose of the optional value is to discard the parameters pushed onto the stack, this operand takes a positive value.


Preserving Calling Procedure State

It is important to preserve the contents of the registers across a procedure call. The necessity for this is illustrated by the following code:


mov ECX, count

repeat:

call compute

… …

… …

loop repeat


The code invokes the compute procedure count times. The ECX register maintains the number of remaining iterations. Recall that, as part of the loop instruction execution, the ECX register is decremented by 1 and, if not 0, starts another iteration.

Suppose, now, that the compute procedure uses the ECX register during its computation.

Then, when compute returns control to the calling program, ECX would have changed, and the program logic would be incorrect.

Since there are a limited number of registers and registers should be used for writing efficient code, registers should be preserved. The stack is used to save registers temporarily.


Which registers should be save?

The answer to this question is simple: Save those registers that are used by the calling procedure but changed by the called procedure. This leads to the following question; Which procedure, the calling or the called should save the registers?


We assume that the called procedure saves the registers that it uses and restores them before returning to the calling procedure. This also conforms to the modular program design principles.


ENTER and LEAVE instructions

The instruction set has two instruction to facilitate stack frame allocation and release on procedure entry and exit. The enter instruction can be used to allocate a stack frame on entering a procedure. The format is:


enter bytes, level


The first operand bytes specifies the number of bytes of local variable storage we want on the new stack frame. The statement:


enter XX,0



is equivalent to:


push EBP

mov EBP, ESP

sub ESP, XX


The leave instruction releases the stack frame allocated by the enter instruction. It does not take any operands. The leave instruction effectively performs the following:


mov ESP, EBP

pop EBP


We use leave instruction before the ret instruction as shown in the following template for procedures:


proc-name:

enter XX,0

. . .

procedure-body

. . .

leave

ret YY


The XX value is nonzero only if our procedure needs some local variable space on the stack frame. The value YY is used to clear the arguments passed on to the procedure.


1http://en.wikipedia.org/wiki/LIFO_%28computing%29

2http://en.wikipedia.org/wiki/Endianness#Little-endian

Labels: , , , , , , , ,

Tuesday, August 25, 2009

Kernel Virtual Filesystem - VFS

The VFS is an abstraction layer to support the implementation of many filesystems.
It specifies a commom file model that is capable of representing any conceivable filesystem's general features and behavior. Its like a framework.

The abstraction layer works by defining the basic conceptual interfaces and data structures that all filesystems support. The Filesystem mold their view of concepts such as "this is how I open files" and "this is what a directory is to me" to match the expectations of the VFS.

To VFS and the rest of the kernel, each filesystem looks the same.
All filesystems rely on VFS to allow them not only to coexist, but also to interoperate.

Unix Filesystems
Historically, Unix has provided four basic filesystem-related abstractions: files, directory entries, inodes and mount points.

A filesystem is a hierarchical storage of data adhering to a specific structure.
Filesystems contain file, directories, and associated control information.
In Unix, filesystems are mounted at a specific mount point in a global hierarchy know as namespace.

A file is an ordered string of bytes. The first byte marks the beginning of the file and the last byte marks the end of the file. Each file is assigned a human-readable name for identification by both the system and the user.

Files are organized in directories.
A directory is analogous to a folder and usually contains related files.
Directories can also contain subdirectories; in this fashion, directories may be nested to form paths. Each component of a path is called a directory entry.
A path example is /home/wolfman/butter, the root directory /, the directories home and wolfman, and the file butter are all directory entries, called dentries.
In Unix, directories are actually normal files that simply list the files contained therein.
Because a directory is a file to the VFS, the same operations performed on files can be performed on directories.

Unix systems separate the concept of a file from any associated information about it, such as access permissions, size, owner creation time, and so on. This information is sometimes called file metadata and is stored in a separate data structure from the file, called the inode (index node).

All this information is tied together with the filesystem's control information, which is stored in the superblock. The superblock is a data structure containing information about the filesystem as a whole.
Filesystem metadata includes information about both the individual files and the filesystem as a whole.

Non-Unix filesystem, such as FAT or NTFS, still work in Linux, but their filesystem code must provide the appearance of these concepts.
For example, even if a filesystem does not support distinct inodes, it mus assemble the inode data structure in memory as if it did.
Or if a filesystem treats directories as a special object, to the VFS they must represent directories as mere files.

VFS Objects and their Data Structures
The four primary object types of the VFS are:
  • The superblock object, which represents a specific mounted filesystem.
  • The inode object, which represents a specific file.
  • The dentry object, which represents a directory entry, a single component of a path.
  • The file object, which represents an open file as associated with a process.
Note that because the VFS treats directories as normal files, there is not a specific directory object.

An operations object is contained within each of these primary objects. These objects describe the methods that the kernel invokes against the primary objects. The are:
  • The super_operations object, which contains the methods that the kernel can invoke on a specific filesystem.
  • The inode_operations object, which contains the methods that the kernel can invoke on a specific file.
  • The dentry_operations object, which contains the methods that the kernel can invoke on a specific directory entry.
  • The file object, which contains the methods that a process can invoke on an open file.
The operations objects are implemented as a structure of pointer to functions that operate on the parent object.

This is the basic of VFS.
Later I'm gonna add some stuff here, but that's for now!!
Cya

Wednesday, August 19, 2009

Kernel Compiling Debian - Compilando Kernel no Debian

This is the classic method to compile, outside the Debian way.
Esse é o método classido de compilar, fora do modo do Debian. (Link em Inglês)

Debian doc


Wednesday, July 01, 2009

Synchronizing is the "key"

Through my kernel studies I found very good stuff about synchronization and I want to share with you.

This first post tend to be only conceptual, I'll write another to deal with the tools that Linux provides for us to synchronize accesses.

In a shared memory application, care must be taken to ensure that shared resources are protected from concurrent access.
Concurrent access of shared data is a recipe for instability!

Properly protecting shared resources can be tough.
When SMP was introduced in 2.0 kernel, the need for synchronizing was raised.
As SMP implies that kernel code can simultaneously run on two or more processors, consequently, without protection, code in the kernel, running on two different processors, can simultaneously access shared data at exactly the same time.

With the introduction of 2.6 kernel, the Linux Kernel became preemptive, this implies that the scheduler can preempt kernel code at virtually any point and reschedule another task.

Critical Regions
Code path that access and manipulate shared data are called critical regions.
It is usually unsafe for multiple threads of execution to access the same resource simultaneously.
To prevent concurrent access, the programmer must ensure that the code executes atomically, that is, the code completes without interruption as if the entire critical region were one indivisible instruction.

For a first example, lets say that we have a cash machne that withdraws money from a person's personal bank account.

After the user has asked for a specific amount of money, the cash machine need to ensure that the money actually exists in that user's account. If the money exists, it then needs to deduct the withdrawal from the total funds available.

Let's presume that another deduction in the user's funds is happening at the same time.
Assume that the user's spouse is initiating another withdrawal at another ATM, or electronically transferring funds out of the account, or the bank is deducting a fee from the account.

These systems would have similar code:
  1. Check whether the deduction is possible
  2. Compute the new total funds
  3. Execute the physical deduction
Presume that the first deduction is a withdrawal from an ATM for $100 and that the second deduction is the bank applying a fee of $10.
Assume the customer has a total of $105 in the bank, one of these transaction cannot correctly complete without sending the account into the red.

We would expect something like this:
  1. The fee transaction happens
  2. Ten dollars is less than $105, so ten is subtracted from 105 to ge a new total of 95
  3. $10 is pocketed by the bank
Then the withdrawal comes along and fails because $95 is less than $100.

Now, the sync fail...
Assume that the two transaction are initiated at roughly the same time.
Both transactions verify that sufficient funds exist: $105 is more than both $100 and $10, so all is good.
Then the withdrawal process subtracts $100 from $105, yielding $5.
The fee transaction then does the same, subtracting $10 from $105 and getting $95.
The withdrawal process updates the user's new total available funds to $5.
Now te fee transaction also udates the new total, resulting in $95.
Free money!!

The bank system must lock the account during certain operations, making each transaction atomic.
Such transactions must occur in their entirety, without interruption, or not occur at all.

Locking
A lock provides a mechanism much like a door. Imagine the room beyond the door as the critical region.
Inside the room, only one thread of execution can be present at a given time. When a thread enters the room, it locks the door behind it.
When the thread is finished manipulating the shared data, it leaves the room and unlocks the door.
If another thread reaches the door while it is locked, it must wait for the thread inside to exit the room and unlock the door before it can enter.
Thread hold locks, locks protect data.

Locks are entirely a programming construct that the programmer must take advantage of.
Locks come in various shapes and sizes, Linux alone implements a handful of different locking mechanisms. The difference among them is the behavior when the lock is contended (already in use), some locks simply busy way (spin in a tight loop, waiting for the lock to become available), whereas other locks put the current task to sleep until the lock becomes available.

Another important thing about locks is that they are implemented using atomic operations, that ensure no race exists.

Deadlocks
A deadlock is a condition involving one or more threads of execution and one or more resources, such that each thread is waiting for one of the rsources, but all the resources are already held.

A good analogy is a four-way traffic stop. If each car at the stop decides to wait for the other cars before going, no car will ever go and we have a traffic deadlock.

Consider n threads and n locks. If each thread holds a lock that the other thread wants, all threads block while waiting for their respective locks to become available. The most common example is with two threads and two locks, which is often called deadly embrace.

Each thread is waiting for the other and neither thread will ever release its original lock; therefor, neither lock will ever become available.

So to prevent this scenarios we could do some steps:
  • Lock ordering is vital. Nested locks must always be obtained in the same order
  • Prevent starvation. "If foo does not occur, will bar wait forever?"
  • Do not double acquire the same lock
  • Design for simplicity
A good example follows:

Assume you have the cat, dog and fox lock that protect data structures of the same name.
Now assume you have a function that needs to work on all three of these data structures simultaneously, perhaps to copy data between them.
The data structures require locking to ensure safe access. If one function acquires the locks in the order cat, dog and fox, then every other function must obtain these locks in this same order.

For example, it is a potential deadlockto first obtain the fox lock, and then obtain the dog lock (because the dog mus always be aquired prior to the fox lock).

Thread one is waiting for the fox lock, which thread two holds, while thread two is waiting for the dog lock, which thread one holds. Neither ever release its lock and hence both wait forever.
If locks were always obtained in the same order, a deadlock in this manner would not be possible.

Whenever locks are nested within other locks, a specific ordering mus be obeyed.
A common practice to release locks is to release it in an order inverse to that in which they were acquired.

Labels:

Tuesday, September 23, 2008

TCP/IP Address Resolution Protocol - ARP

The basic operation of ARP is a request/response pair of transmissions on the local network.
The source transmits a broadcast containing information about the destination. The destination then responds unicast back to the source, telling the source the hardware address of the destination.

ARP Message types and address designations

There are two different messages sent in ARP, one from the source to the destination and one from the destination to the source.

The sender is the one that is transmitting the message and the target is the one receiving it. The identity of sender and target change for each message they trade.
  • Request: For the initial request, the sender is the source, the device with the IP datagram to send, and the target is the destination.
  • Reply: The sender is the destinations; it replies to the source, which becomes the target.
Each message has two addresses (layer two and layer three), so four different addresses are sent in each message.
  • Sender Hardware Address: The layer two address (MAC)
  • Sender Protocol Address: The layer three address (IP)
  • Target Hardware Address: The layer two address (MAC)
  • Target Protocol Address: The layer three address (IP)
ARP General Operation

The overall operation that ARP uses to resolve an address is described in the next nine steps:

  1. Source Device Checks Cache: The source device will check its cache to see if already has a resolution of the destination device, if it has, it can skip to the last step.

  2. Source Device Generates ARP Request Message: It puts its own data link layer as the Sender Hardware Address and its own IP address as the Sender Protocol Address.
    Fills the IP address of the destination as the Target Protocol Address and the Target Hardware Address, that we must discover, goes blank.

  3. Source Device Broadcasts ARP Request Message

  4. Local Devices Process ARP Request Message: Message is received by each device on the local network and them look for a match on Target Protocol Address.

  5. Destination Device Generates ARP Reply Message: The device whose IP address matches will generate an ARP Reply message.
    It takes the Sender Hardware Address and Sender Protocol Address fields from the ARP Request message and uses these as the values for the Target Hardware Address and Target Protocool Address fo the reply.
    Finally it fills the Sender Protocol Address with its IP and the Sender Hardware Address with its MAC.

  6. Destination Device Updates ARP Cache: The destination will now add an entry to its own ARP cache containing the hardware and IP addresses of the source that sent the ARP Request.

  7. Destination Device Sends ARP Reply Message: This is sent as unicast to the source device.

  8. Source Device Processes ARP Reply Message

  9. Source Device Updates ARP Cache

ARP Message Format

The message format includes a field describing the type of message (its operational code or opcode) and information on both layer two and layer three address.
In order to support addresses that may vary on length, the format specifies the type of protocol used at both layer two and three and the length of addresses used at each of these layers.

Proxy ARP

ARP was designed to be used by devices directly connected on a local network.
Each device should be capable of sending both unicast and broadcast transmissions directly to each other.
So if device A and device B are separated by a router, they aren't local to each other. Device A would not send directly to B or vice-versa; they would send to the router at layer two, and will be considered "two hops apart" at layer three.

Proxy ARP Operation

The router that sits between the two networks is configured to respond to device A's broadcast on behalf of device B. It does not send back to A the hardware address of device B; since they are not on the same network, A cannot send directly to B anyway.

Instead, the router send A its own hardware address. A then sends to the router, which forwards the message to B on the other network. And the router does the same thing on A's behalf for B.

Proxy ARP provides flexibility for networks where hosts are not all actually on the same physical network.

Remeber, ARP cannot function between devices on different physical networks.

Friday, September 19, 2008

Understanding the Address Resolution

The network layer (OSI Model) or Internet layer (TCP/IP Model) is where the internetworking protocols are defined.

These two layers are intimately related, they perform different tasks but as neighbors in the protocol stack, must cooperate with each other.

So we need some kind of "glue" to join these two layers and let the "conversation" between them happen. The main job performed by this "glue" is address resolution, or providing mappings between layer two and layer three addresses. This resolution can be done in either direction, and is represented by the two TCP/IP protocols ARP and RARP.

Address Resolution Protocol (ARP)

Communication on an internetwork is accomplished by sending data at layer three using a network layer address, but the actual transmission of that data occurs at layer two using a data link layer address.

Every device that has a specified networking stack will have a both a layer two and a layer three address. This is necessary to define a way of being able to link these addresses together.
This is done by taking a network layer address (IP for example) and determining what data link layer address (MAC Address) goes with it.
This is called address resolution.

There is a lot of discussion about where Address Resolution fits in OSI Model, and it's something hard to explain. So as OSI is a Model, and rules must be applied except the exceptions (:o nice uh?) the address resolution case is one of these.

In the OSI Model there are two layers that deal with addressing: data link layer and network layer.
Physical Layer is concerned only with send at bit level.
The layers above network work with network layer addresses (IP).

So, why addressing at two different layers?
The different types of addresses are for different purposes.

Layer two addresses are used for local transmissions between hardware devices that can communicate directly, so it deals with directly-connected devices (on the same network).
Layer three addresses are used in internetworking, to create some kind of "virtual" network at the network layer, so it deals with indirectly-connected devices (as well as directly-connected).

The big problem is: IP addresses are too high level for the physical hardware on network to deal with.

So, we have two methods to do this address resolution, one more efficient and less flexible and other less efficient and more flexible.

Direct Mapping

Network layer addresses must be resolved into data link layer addresses numerous times during the travel of each datagram across an internetwork and the easiest method of accomplishing this is to do direct mapping between the two types of addresses (The IP and the MAC addresses).

The idea here is to choose a scheme for layer two and layer three that you can determine one from the other using a simple algorithm so you can take the layer three address and with a short procedure convert it to a layer two address. Thus whenever you have the layer tree address, you already have the layer two address.

The simplest example of direct mapping would be if we used the same structure and semantics for both data link and network layer addresses. Then, determining the layer two address is a simple matter of selecting a certain portion of the layer three address.

Example:
Consider a simple LAN like ARCNet,it uses a short 8-bit data link layer address
We could set up an IP network on such a LAN by taking a class C (or /24) network and using the ARCNet data link layer as the last octet. So with a network at 222.101.33.0/24 we could assign the physical address of #1 to IP 222.101.33.1 and the physical #29 to 222.101.33.29.

So to get the hardware address of a device, you just use the final 8 bits of the IP address, this is highly efficient because requires no exchange of data on network at all.


Direct Mapping is not possible with large hardware addresses


Direct Mapping only works when its possible to express the data link layer address with the network layer address. So, the same IP of 222.101.33.29 on an Ethernet network, will have a different type of hardware address, its a "hard-wired" address that comes with the card itself.

These addresses are 48-bits long and not 8.
This means the layer two address is bigger than layer three address, so there's no way to do a direct mapping here.

Dynamic Address Resolution

This is the alternative to direct mapping resolution.

Have you seen limousine drives waiting to pick up a person at the airport they do not know personally?

This is similar to our problem: they know the name of the person they must transport, but not the person's face. To find the person, they hold up a card bearing that person's name. Everyone other than that person ignores the card, but hopefully the individual being sought will recognize it and approach the driver.

So, let look at the computer's world: Device A wants to send to device B but know only device B's network layer address (it's name) and not its data link layer address (it's "face"). It broadcasts a layer two frame containing the layer three address of device B - this is like holding up the card wit someone's name on it. The devices other than B don't recognize this layer three address and ignore it.
Device B, that knows its own network layer address, recognizes this broadcast and sends a direct response back to device A, telling device A what device B's layer two address is.

The good thing is that is no need for any specific relationship between the network layer address and the data link layer address.

Dynamic Address Resolution Caching

Most devices on local network send to only a small handful of other physical devices, this is called locality of reference.

When you send a request to an Internet Web site from our office PC, it will need to go first to your company network's local router, so you'll need to resolve the route's layer two address.
If later you click a link on that site, that request will also need to go to the router.

Having to do a fresh resolution each time is stupid. It would be like having to look up the phone number of your best friend every time you wan to call to say hello.

After a device's network later address is resolved to a data link layer address, the link between the two is kept in the memory of the device for a period of time. When it needs the layer two address the next time, a quick lookup in its cache is made.

Other enhancements to dynamic resolution

When we send a request that needs to go to our local router, we resolve its address and send it the request.

A reply comes back to the router to be sent to us, so the router needs our address and have to do a dynamic resolution on us even though we just exchanged frames. Again: stupid. Instead, we can improve efficiency through cross-resolution; when device A resolves the address of device B, device B also adds the entry for device A to its cache.

As devices on a local networks are going to talk to each other fairly often, when A is resolving B's network layer address, it will broadcast a frame that devices C, D, E and so on will see.
Why not have them also update their cache tables with resolution information that they see, for future use?

So that's it, most of this was a synopsis from TCP/IP Guide, if you wanna take a look, it's worth it.

Thanks!!

Wednesday, October 18, 2006

Compilando e instalando o cx_Oracle com python2.4 no Debian

O cx_Oracle é uma interface de conexão ao Oracle para Python e segue todas as especificações da API de banco de dados do Python, com algumas exceções. E como as versões para download no site são em rpm ou tar.gz, vamos baixar o tar.gz e compilar tudo do zero. Pelo menos vamos ter certeza de que está tudo compilado certinho.

Primeiramente fazer o download do cx_Oracle.

Instalar o pacote python2.4-dev que é utilizado pelo cx_Oracle na hora da compilação

#aptitude install python2.4-dev

Agora descompacte e entre no diretório:
#tar -zxvf cx_Oracle-4.2.tar.gz
#cd cx_Oracle-4.2

Agora, vamos compilar os fontes, tenha certeza de ter os pacotes para compilação instalados, gcc, binutils, make...

Bom agora para compilar vamos fazer:
#python setup.py build

Com os pacotes corretos e, claro, seu ORACLE_HOME setado e o seu LD_LIBRARY_PATH=$ORACLE_HOME/lib , configurados...se você quiser ter certeza de que está tudo certinho, digite:

#ldconfig

Bom, deve compilar direitinho e com alguns warnings, mas se compilar, então está tudo bem.
Agora vamos instalar

#python setup.py install

Com isso ele vai jogar o binário do cx_Oracle em /usr/lib/python2.4/site-packages/ e você vai estar pronto para utilizá-lo.

Caso você queira se certificar de que o binário do cx_Oracle está em perfeitas condições, vá em /usr/lib/python2.4/site-packages/ e digite:

#ldd cx_Oracle.so
linux-gate.so.1 => (0xffffe000)
libclntsh.so.10.1 => /usr/lib/oracle/xe/app/oracle/product/10.2.0/server//lib/libclntsh.so.10.1 (0xa714f000) libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xa712a000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xa6ff9000)
libnnz10.so => /usr/lib/oracle/xe/app/oracle/product/10.2.0/server//lib/libnnz10.so (0xa6df4000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xa6df0000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xa6dca000)
libnsl.so.1 => /lib/tls/i686/cmov/libnsl.so.1 (0xa6db4000)
/lib/ld-linux.so.2 (0x75555000)

Tem que estar tudo direitinho assim, caso tenha algum "Not found", é bom verificar se o seu ORACLE_HOME ou o LD_LIBRARY_PATH está configurado direitinho.

Agora para testar, é só ver se o python importa o módulo:

#python
Python 2.4.4c0 (#2, Jul 30 2006, 15:43:58)
[GCC 4.1.2 20060715 (prerelease) (Debian 4.1.1-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cx_Oracle
>>> cx_Oracle

>>>


Bom, é isso aew, está funcionando.

Caso tenha algum problema, uma boa fonte de pesquisa é:

Os arquivos da mail list do cx_Oracle: aqui
E o google.

Flw, abraço


Wednesday, July 26, 2006

SHFS - Montando diretórios remotamente usando SSH no Debian

O shfs é um módulo que permite montar diretórios remotamente via ssh, ou seja, você pode ter um diretório do seu servidor num Data Center montado na sua estação de trabalho via ssh. Então você iria acessar esse diretório como se fosse local em sua máquina.

Primeiramente temos que instalar alguns pacotes:

#aptitude install shfs-source shfs-utils module-assistant

O módulo precisa sempre ser baixado como source pois ele precisa ser compilado direitinho para bater com a versão do seu kernel.
Agora vamos criar o módulo.

#module-assistant build shfs

Vai abrir uma tela mostrando o processo de compilação do módulo, quando terminar, teremos que instalar o módulo:

#module-assistant install shfs

Pronto, o módulo já está instalado, agora precisamos criar um diretório local para montarmos o diretório do servidor.

#mkdir /mnt/dir_servidor
#shfsmount root@ip_servidor:/data/backups /mnt/dir_servidor
Password:

Pronto, depois de colocar a senha, ele vai montar o diretório.
Para esse diretório sempre ser montado quando você ligar a máquina, escreva na última linha do seu /etc/fstab:

root@ip_servidor:/data/backups /mnt/dir_servidor shfs rw,user,noauto 0 0

Ou, se não quiser usar o fstab, adicione essa linha ao final do arquivo /etc/init.d/bootmisc.sh

shfsmount root@ip_servidor:/data/backups /mnt/dir_servidor

Agora temos que adicionar o módulo shfs no /etc/modules para ser carregado automaticamente também.

E pronto, será montado sempre que você reiniciar a máquina, mas como você pôde perceber, ele pede senha. Então, temos que fazê-lo autenticar-se automaticamente.
O comando abaixo irá gerar uma chave criptográfica de 2048 bits RSA, que usaremos para a autenticação no servidor. Dê enter em todas as opções, inclusive na Passphrase.

#ssh-keygen -t rsa -b 2048

Após isso, temo que copiar essa chave para o servidor, para que ele nos deixe usar o ssh sem ter que escrever sempre a senha.

#ssh-copy-id -i ~/.ssh/id_rsa.pub root@ip_servidor
Password:

Digite a senha e pronto, agora sempre quando o diretório for montado, não pedirá mais senha.


That's all folks!

Monday, July 24, 2006

Arrumando as mensagens de erro do GPG ao usar o APT

Você já se deparou com uma mensagem parecida com esta ao dar um update na lista de pacotes do apt?

W: GPG error: http://www.debian-multimedia.org testing Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 07DC563D1F41B907
W: You may want to run apt-get update to correct these problems

Bom, isso acontece porque você não adicionou a chave pública do servidor na lista do apt, para torná-lo confiável.

Esse comando vai importar a chave do debian-multimedia.org:
#gpg --keyserver hkp://wwwkeys.eu.pgp.net --recv-keys 1F41B907
#gpg --armor --export 1F41B907 | apt-key add -

Esse aqui importa a chave do debian oficial:
#gpg --keyserver keyring.debian.org --recv-keys 010908312D230C5F
#gpg --armor --export 2D230C5F | apt-key add -


É isso aí!!