Best linux questions in April 2011

Is it safe to parse a /proc/ file ?

73 votes

Well, this is going to be a short one...

I want to parse /proc/net/tcp/, but is it safe? I mean, how to open and read it and not be afraid, that some other process (or the OS) will be changing it in the same time?

Although the files in /proc appear as regular files in userspace, they are not really files but rather entities that support the standard file operations from userspace (open, read, close). Note that this is quite different than having an ordinary file on disk that is being changed by the kernel.

All the kernel does is print its internal state into its own memory using a sprintf-like function, and that memory is copied into userspace whenever you issue a read(2) system call.

The kernel handles these calls in an entirely different way than for regular files, which could mean that the entire snapshot of the data you will read could be ready at the time you open(2) it, while the kernel makes sure that concurrent calls are consistent and atomic. I haven't read that anywhere, but it doesn't really make sense to be otherwise.

My advice is to take a look at the implementation of a proc file in your particular Unix flavour. This is really an implementation issue (as is the format and the contents of the output) that is not governed by a standard.

The simplest example would be the implementation of the uptime proc file in Linux: http://lxr.free-electrons.com/source/fs/proc/uptime.c. Note how the entire buffer is produced in the callback function supplied to single_open.

How do I start and stop a Linux program using the subprocess module in Python?

14 votes

I’m writing a web app that uses Selenium to screen-scrape another website. This screen-scraping only happens once a day, so I’d rather not leave Selenium and Xvfb running all the time.

I’m trying to figure out how to start Xvfb and Selenium from Python, and then stop them once the screen-scraping’s done.

If I was doing it manually, I’d start them at the command line, and hit CTRL C to stop them. I’m trying to do the same thing from Python.

I seem to be able to successfully start Xvfb like this:

xvfb = Popen('Xvfb :99 -nolisten tcp', shell=True)

But when I’ve tried to terminate it:

xvfb.terminate()

and then tried to start it again (by repeating my initial command), it tells me it’s already running.

I don't know why you want to run Xvfb as root. Your usual X server only needs to run as root (on many but not all unices) only so that it can access the video hardware; that's not an issue for Xvfb by definition.

tempdir = tempfile.mkdtemp()
xvfb = subprocess.Popen(['Xvfb', ':99', '-nolisten', 'tcp', '-fbdir', tempdir])

When you terminate the X server, you may see a zombie process. This is in fact not a process (it's dead), just an entry in the process table that goes away when the parent process either reads the child's exit status or itself dies. Zombies are mostly harmless, but it's cleaner to call wait to read the exit status.

xvfb.terminate()
# At this point, `ps -C Xvfb` may still show a running process
# (because signal delivery is asynchronous) or a zombie.
xvfb.wait()
# Now the child is dead and reaped (assuming it didn't catch SIGTERM).

Why is CUDA pinned memory so fast?

8 votes

I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it states that locking the page prevents it from being swapped out:

mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully;

In my tests, I had a fews gigs of free memory on my system so there was never any risk that the memory pages could've been swapped out yet I still observed the speedup. Can anyone explain what's really going on here?, any insight or info is much appreciated.

CUDA Driver checks, is memory range is locked or not and then it will use different codepath. Locked memory is stored in the physical memory, so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault on access, and it is stored not only in memory, so driver need to access every page of non-locked memory, copy it into pinned buffer and pass it to DMA (Syncronious, page-by-page copy).

As described here http://forums.nvidia.com/index.php?showtopic=164661

host memory used by the asynchronous mem copy call needs to be page locked through cudaMallocHost or cudaHostAlloc.

I can also recommend to check cudaMemcpyAsync and cudaHostAlloc manuals at developer.download.nvidia.com. HostAlloc says that cuda driver can detect pinned memory:

The driver tracks the virtual memory ranges allocated with this(cudaHostAlloc) function and automatically accelerates calls to functions such as cudaMemcpy().

list_entry in Linux

7 votes

hello everyone, I found in

user/include/linux/list.h

this declaration:

#define list_entry(ptr, type, member) \
((type *)((char *)(ptr) – (unsigned long)(&((type *)0)->member)))

can somebody please explain what is this and how does it work, thanks in advance

P.S. please simplify your answer as much as possible, I know about threads, processes in Linux, now I'm exploring possibilities and I'm a little bit stuck with this one.

Consider two structs like this:

struct data {
    int something;
};

struct container {
    int something_before;
    struct data data_item;
    int something_after;
};

Assume you have a pointer to a struct data value:

struct data *data_ptr;

The list_entry() macro helps you to convert data_ptr to a pointer to the struct container value that holds the struct data value, pointed to by ptr:

struct container *cont_ptr = list_entry(data_ptr, struct container, data_item);

The macro works by computing the offset of data_item inside the struct container, and subtracting that many bytes from the data_ptr pointer. This, when cast to struct container *, gives a valid pointer to the struct container that holds this particular struct data "inside".

The macro can also be simplified a bit by using the builtin offsetof() macro:

#define list_entry(ptr, type, member) \
    ((type *)((char *)(ptr) – offsetof(type, member)))

buffered asynchronous file I/O on linux

7 votes

I am looking for the most efficient way to do asynchronous file I/O on linux.

The POSIX glibc implementation uses threads in userland.

The native aio kernel api only works with unbuffered operations, patches for the kernel to add support for buffered operations exist, but those are >3 years old and no one seems to care about integrating them into the mainline.

I found plenty of other ideas, concepts, patches that would allow asynchronous I/O, though most of them in articles that are also >3 years old. What of all this is really available in todays kernel? I've read about servlets, acalls, stuff with kernel threads and more things I don't even remember right now.

What is the most efficient way to do buffered asynchronous file input/output in todays kernel?

Unless you want to write your own IO thread pool, the glibc implementation is an acceptable solution. It actually works surprisingly well for something that runs entirely in userland.

The kernel implementation does not work with buffered IO at all in my experience (though I've seen other people say the opposite!). Which is fine if you want to read huge amounts of data via DMA, but of course it sucks big time if you plan to take advantage of the buffer cache.
Also note that the kernel AIO calls may actually block. There is a limited size command buffer, and large reads are broken up into several smaller ones. Once the queue is full, asynchronous commands run synchronously. Surprise. I've run into this problem a year or two ago and could not find an explanation. Asking around gave me the "yeah of course, that's how it works" answer.
From what I've understood, the "official" interest in supporting buffered aio is not terribly great either, despite several working solutions seem to be available for years. Some of the arguments that I've read were on the lines of "you don't want to use the buffers anyway" and "nobody needs that" and "most people don't even use epoll yet". So, well... meh.

Being able to get an epoll signalled by a completed async operation was another issue until recently, but in the meantime this works really fine via eventfd.

Note that the glibc implementation will actually spawn threads on demand inside __aio_enqueue_request. It is probably no big deal, since spawning threads is not that terribly expensive any more, but one should be aware of it. If your understanding of starting an asynchronous operation is "returns immediately", then that assumption may not be true, because it may be spawning some threads first.

EDIT:
As a sidenote, under Windows there exists a very similar situation to the one in the glibc AIO implementation where the "returns immediately" assumption of queuing an asynchronous operation is not true.
If all data that you wanted to read is in the buffer cache, Windows will decide that it will instead run the request synchronously, because it will finish immediately anyway. This is well-documented, and admittedly sounds great, too. Except in case there are a few megabytes to copy or in case another thread has page faults or does IO concurrently (thus competing for the lock) "immediately" can be a surprisingly long time -- I've seen "immediate" times of 2-5 milliseconds. Which is no problem in most situations, but for example under the constraint of a 16.66ms frame time, you probably don't want to risk blocking for 5ms at random times. Thus, the naive assumption of "can do async IO from my render thread no problem, because async doesn't block" is flawed.

Regarding Hard Link

7 votes

Can somebody please explain me why the kernel doesn't allow us to make a hard link to a directory. Whether it is because it breaks the rule of directed acyclic graph structure of the file-system or it is because of some other reason. What other complications come if it allows that?

Back in the days of 7th Edition (or Version 7) UNIX, there were no system calls mkdir(2) and rmdir(2). The mkdir(1) program was SUID root, and used the mknod(2) system call to create the directory and the link(2) system call to make the entries for . and .. in the new directory. The link(2) system call only allowed root to do that. Consequently, way back then (circa 1978), it was possible for the superuser to create links to directories, but only the superuser was permitted to do so to ensure that there were no problems with cycles or other missing links. There were diagnostic programs to pick up the pieces if the system crashed while a directory was partly created, for example.


You can find the Unix 7th Edition manuals at Bell Labs. Sections 2 and 3 are devoid of mkdir(2) and rmdir(2). You used the mknod(2) system call to make the directory:

NAME

mknod – make a directory or a special file

SYNOPSIS

mknod(name, mode, addr)
char *name;

DESCRIPTION

Mknod creates a new file whose name is the null-terminated string pointed to by name. The mode of the new file (including directory and special file bits) is initialized from mode. (The protection part of the mode is modified by the process’s mode mask; see umask(2)). The first block pointer of the i-node is initialized from addr. For ordinary files and directories addr is normally zero. In the case of a special file, addr specifies which special file.

Mknod may be invoked only by the super-user.

SEE ALSO

mkdir(1), mknod(1), filsys(5)

DIAGNOSTICS

Zero is returned if the file has been made; – 1 if the file already exists or if the user is not the superuser.

The entry for link(2) states:

DIAGNOSTICS

Zero is returned when a link is made; – 1 is returned when name1 cannot be found; when name2 already exists; when the directory of name2 cannot be written; when an attempt is made to link to a directory by a user other than the super-user; when an attempt is made to link to a file on another file system; when a file has too many links.

The entry for unlink(2) states:

DIAGNOSTICS

Zero is normally returned; – 1 indicates that the file does not exist, that its directory cannot be written, or that the file contains pure procedure text that is currently in use. Write permission is not required on the file itself. It is also illegal to unlink a directory (except for the super-user).

The manual page for the ln(1) command noted:

It is forbidden to link to a directory or to link across file systems.

The manual page for the mkdir(1) command notes:

Standard entries, '.', for the directory itself, and '..' for its parent, are made automatically.

This would not be worthy of comment were it not that it was possible to create directories without those links.


Nowadays, the mkdir(2) and rmdir(2) system calls are standard and permit any user to create and remove directories, preserving the correct semantics. There is no longer a need to permit users to create hard links to directories. This is doubly true since symbolic links were introduced - they were not in 7th Edition UNIX, but were in the BSD versions of UNIX from quite early on.


With normal directories, the .. entry unambiguously links back to the (single, solitary) parent directory. If you have two hard links (two names) for the same directory in different directories, where does the .. entry point? Presumably, to the original parent directory - and presumably there is no way to get to the 'other' parent directory from the linked directory. That's an asymmetry that can cause trouble. Normally, if you do:

chdir("./subdir");
chdir("..");

(where ./subdir is not a symbolic link), then you will be back in the directory you started from. If ./subdir is a hard link to a directory somewhere else, then you will be in a different directory from where you started after the second chdir(). You'd have to show that with a pair of stat() calls before and after the chdir() operations shown.

Should I stick with bash for advanced Linux automation scripts, or switch to Python?

7 votes

I have a very basic knowledge of bash shell programming and can do simple jobs like backups etc.

Now I am thinking of tackling more advanced tasks, and must decide whether I should stick with bash and study more advanced topics, or learn Python.

Since I have to spend time studying either way, I'd like to spend it on the language that'll give me the most advantages.

Related:

Python and/or Perl VS bash
Is there an advantage to using Bash over Perl or Python?
Python vs Bash - In which kind of tasks each one outruns the other performance-wise ?

Bash is pretty powerful when you know it well, but it has a lot of pitfalls, and is thus pretty hard to get right. Bash is also extremely well suited to the task at hand. In particular it lets you parallelize simple things very easily.

Python on the other hand is a much saner language, but it's not specifically designed for your task, so it will feel a bit more verbose at first. Once you develop your small library of functions designed for your needs it should be great, though.

Printing in the same line with a pause in C

6 votes

I want to make my program to print something, then wait for a few seconds and then print something else in the same line. I've tried to write it as:

printf ("bla bla bla");
sleep (2);
printf ("yada yada yada\n");

but in the output I get to wait for 2 seconds and then I get the whole line printed as one. When I tried to put the output in different lines it did print with a pause.

How do I make it to print with a pause in the same line?

*Working on Linux

printf ("bla bla bla");
fflush(stdout);
sleep (2);
printf ("yada yada yada\n");

fflush forces the stdout internal buffer to be flushed to the screen.

where does kernel store processes which are not running?

6 votes

hello, everyone I have some question about tasks in Linux, I know that all tasks which are currently at the state TASK_RUNNING are in data structure called runqueue, but what about the tasks which are waiting for some event (states which are not TASK_RUNNING, for example one which is waiting for the input from keyboard). Do I have some other data structure for such tasks or only general list of tasks? thanks in advance for any explanation

Processes in a TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE state are further subdivided in to different classes, each of which corresponds to a specific event. In this state, the process state does not provide enough info to retrieve the process descriptor quickly, so another list of processes called wait_queue are used. Wait_queue implements conditional waits on events. A process waiting for a specific event is placed in the proper wait queue.

Wait queues are implemented as cyclical lists whose elements include pointers to process descriptors. Each element of a wait queue list is of type wait_queue:

struct wait_queue {  
    struct task_struct * task;  
    struct wait_queue * next;  
}; 

How does AppArmor do "Environment Scrubbing"?

5 votes

The AppArmor documentation mentions giving applications the ability to execute other programs with or without enviroment scrubbing. Apparently a scrubbed environment is more secure, but the documentation doesn't seem to specify exactly how environment scrubbing happens.

What is environment scrubbing and what does AppArmor do to scrub the environment?

"Environment scrubbing" is the removal of various "dangerous" environment variables which may be used to affect the behaviour of a binary - for example, LD_PRELOAD can be used to make the dynamic linker pull in code which can make essentially arbitrary changes to the running of a program; some variables can be set to cause trace output to files with well-known names; etc.

This scrubbing is normally performed for setuid/setgid binaries as a security measure, but the kernel provides a hook to allow security modules to enable it for arbitrary other binaries as well.

The kernel's ELF loader code uses this hook to set the AT_SECURE entry in the "auxiliary vector" of information which is passed to the binary. (See here and here for the implementation of this hook in the AppArmor code.)

As execution starts in userspace, the dynamic linker picks up this value and uses it to set the __libc_enable_secure flag; you'll see that the same routine also contains the code which sets this flag for setuid/setgid binaries. (There is equivalent code elsewhere for binaries which are statically linked.)

__libc_enable_secure affects a number of places in the main body of the dynamic linker code, and causes a list of specific environment variables to be removed.