Best linux questions in June 2012

Can I gain root permission without leaving vim?

10 votes

When I edit some files which require root permission(for example the files under /etc), but I often forget run the vim with sudo, so after finishing edit, and type :wq to save and leave, I find I cannot, even use !wq, because the file is readonly, if I leave and reedit the file, all my work will lost, but if not, I cannot save my edit. So, Can I gain root permission without leaving vim?

To force a save use the following command

:w !sudo tee %

It will prompt you for your password.

Get large list of files, sorted by file time in *milliseconds*

9 votes

I know my file system is storing the file modification time in milliseconds but I don't know of a way to access that information via PHP. When I do an ls --full-time I see this:

-rw-r--r-- 1 nobody nobody 900 2012-06-29 14:08:37.047666435 -0700 file1
-rw-r--r-- 1 nobody nobody 900 2012-06-29 14:08:37.163667038 -0700 file2

I'm assuming that the numbers after the dot are the milliseconds.

So I realize I could just use ls and have it sort by modification time, like this:

$filelist = `ls -t`;

However, the directory sometimes has a massive number of files and I've noticed that ls can be pretty slow in those circumstances.

So instead, I've been using find but it doesn't have a switch for sorting results by modification time. Here's an example of what I'm doing now:

$filelist = `find $dir -type f -printf "%T@ %p\n" | sort -n | awk '{print $2}'`;

And, of course, this doesn't sort down to the milliseconds so files that were created in the same second are sometimes listed in the wrong order.

Only a few filesystems (like EXT4) actually store these times up to nanosecond precision. It's not something that's guaranteed to be available, on other filesystems (like EXT3) you'll notice that the fractional part is .000000000

Now, if this feature is really important for you, you could write a specialized PHP extension. This will bypass the calls to external utilities and should be a great deal faster. The process of creating extension is well explained in many places, like here. A reasonable approach to such an extension could be an alternative fstat function implementation that exposes the high-precision fields available in the stat structure defined in /usr/include/bits/stat.h nowadays.

As usual, nothing is free. This extension will have to be maintained, it's probably not possible to get it running on hosted environments, etc. Plus, your php solution will only run on servers where your extension was deployed (although that can circumvented by falling back on the ls based technique if the extension is not detected).

Watch a memory range in gdb?

8 votes

I am debugging a program in gdb and I want the program to stop when the memory region 0x08049000 to 0x0804a000 is accessed. When I try to set memory breakpoints manually, gdb does not seem to support more than two locations at a time.

(gdb) awatch *0x08049000
Hardware access (read/write) watchpoint 1: *0x08049000
(gdb) awatch *0x08049001
Hardware access (read/write) watchpoint 2: *0x08049001
(gdb) awatch *0x08049002
Hardware access (read/write) watchpoint 3: *0x08049002
(gdb) run
Starting program: /home/iblue/git/some-code/some-executable
Warning:
Could not insert hardware watchpoint 3.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.

There is already a question where this has been asked and the answer was, that it may be possible to do this with valgrind. Unfortunately the answer does not contain any examples or reference to the valgrind manual, so it was not very enlightning: How can gdb be used to watch for any changes in an entire region of memory?

So: How can I watch the whole memory region?

If you use GDB 7.4 together with Valgrind 3.7.0, then you have unlimited "emulated" hardware watchpoints.

Start your program under Valgrind, giving the arguments --vgdb=full --vgdb-error=0 then use GDB to connect to it (target remote | vgdb). Then you can e.g. watch or awatch or rwatch a memory range by doing rwatch (char[100]) *0x5180040

See the Valgrind user manual on gdb integration for more details

Is kernel/sched.c/context_switch() guaranteed to be invoked every time a process is switched in?

8 votes

I want to alter the Linux kernel so that every time the current PID changes - i.e., a new process is switched in - some diagnostic code is executed (detailed explanation below, if curious). I did some digging around, and it seems that every time the scheduler chooses a new process, the function context_switch() is called, which makes sense (this is just from a cursory analysis of sched.c/schedule() ).

The problem is, the Linux scheduler is basically black magic to me right now, so I'd like to know if that assumption is correct. Is it guaranteed that, every time a new process is selected to get some time on the CPU, the context_switch() function is called? Or are there other places in the kernel source where scheduling could be handled in other situations? (Or am I totally misunderstanding all this?)

To give some context, I'm working with the MARSS x86 simulator trying to do some instrumentation and measurement of certain programs. The problem is that my instrumentation needs to know which executing process certain code events correspond to, in order to avoid misinterpreting the data. The idea is to use some built-in message passing systems in MARSS to pass the PID of the new process on every context switch, so it always knows what PID is currently in execution. If anyone can think of a simpler way to accomplish that, that would also be greatly appreciated.

Yes, you are correct.

The schedule() will call context_switch() which is responsible for switching from one task to another when the new process has been selected by schedule().

context_switch() basically does two things. It calls switch_mm() and switch_to().

switch_mm() - switch to the virtual memory mapping for the new process

switch_to() - switch the processor state from the previous process to the new process (save/restore registers, stack info and other architecture specific things)

As for your approach, I guess it's fine. It's important to keep things nice and clean when working with the kernel, and try to keep it relatively easy until you gain more knowledge.

The address in Kernel

7 votes

I have a question when I located the address in kernel. I insert a hello module in kernel, in this module, I put these things:

char mystring[]="this is my address";
printk("<1>The address of mystring is %p",virt_to_phys(mystring));

I think I can get the physical address of mystring, but what I found is, in syslog, the printed address of it is 0x38dd0000. However, I dumped the memory and found the real address of it is dcd2a000, which is quite different from the former one. How to explain this? I did something wrong? Thanks

PS: I used a tool to dump the whole memory, physical addresses.

According to the Man page of VIRT_TO_PHYS

The returned physical address is the physical (CPU) mapping for the memory address given. It is only valid to use this function on addresses directly mapped or allocated via kmalloc.

This function does not give bus mappings for DMA transfers. In almost all conceivable cases a device driver should not be using this function

Try allocating the memory for mystring using kmalloc first;

char *mystring = kmalloc(19, GFP_KERNEL);
strcpy(mystring, "this is my address"); //use kernel implementation of strcpy
printk("<1>The address of mystring is %p", virt_to_phys(mystring));
kfree(mystring);

Here is an implementation of strcpy found here:

char *strcpy(char *dest, const char *src)
{
    char *tmp = dest;

    while ((*dest++ = *src++) != '\0')
            /* nothing */;
    return tmp;
}

what is the algorithm behind the factor command in linux?

7 votes

factor command prints the prime factors of specified integer NUMBER.

when i tried it

factor 12345678912345678912

even for such big numbers, it results within mills.

which algorithm is it using?

Gnu coreutils manual informs that Pollard's rho algorithm is being used.

http://www.gnu.org/software/coreutils/manual/html_node/factor-invocation.html

How can I get the data sent to printer by attaching linux pc to the printer side of the wire?

6 votes

I have a very very old PC that is running DOS and using an ISA card to receive the data from an old fashioned testing device. What I want to do is attach the Printer (LPT) wire to the old PC and attach another PC with linux to the other side of the wire. The linux pc should behave like its a printer device so it can receive the data which should be printed. Following up I want to interpret this data,...

You basically don't even need linux on the other machine.

There is something called the INTERLNK and INTERSVR that comes bundled with MSDOS.

You can use that to make the file sharing work from DOS using a parallel port.

Here is a guide that connects two PCs in DOS mode.

http://www.pcxt-micro.com/dos-interlink.html

Considering your PC is quite old, this might not work for you because INTERNK and INTERSVR are available with MSDOS 6.22 and later versions only.

I would suggest to use a bootable floppy to get the correct MSDOS version and use this technique- fairly simple guide link above.

P.S: Make sure you have the correct wires - You can read the "whole" discussion in comments here to understand what wire can be used for what.

http://www.computing.net/answers/dos/dcc-connection-in-dos/16366.html

Hope this helps!

what does 'low memory' mean in linux

6 votes

HI I'm Korean and getting little confused on "The boot program first copies itself to a fixed high-memory address to free up low memory for the operating system".

What I know about low memory that I found by googling was that this is first 640K memory in DOS system. Does this means all of the OS system (like kernel) goes in to low memory (640K) ????

Thanks for reading this.

This link could be helpful: Virtual Memory

Mainly,

On 32-bit systems, memory is now divided into "high" and "low" memory. Low memory continues to be mapped directly into the kernel's address space, and is thus always reachable via a kernel-space pointer. High memory, instead, has no direct kernel mapping. When the kernel needs to work with a page in high memory, it must explicitly set up a special page table to map it into the kernel's address space first. This operation can be expensive, and there are limits on the number of high-memory pages which can be mapped at any particular time.

This question on unix.stackexchange is a little more in-depth: High and low memory

how to detect if a thread or process is getting starved due to OS scheduling

6 votes

This is on Linux OS. App is written in C++ with ACE library.

I am suspecting that one of the thread in the process is getting blocked for unusually long time(5 to 40 seconds) sometimes. The app runs fine most of the times except couple times a day it has this issue. There are other similar 5 apps running on the box which are also I/O bound due to heavy socket incoming data.

I would like to know if there is any thing I can do programatically to see if the thread/process are getting their time slice.

If a process is being starved out, self monitoring for that process would not be that productive. But, if you just want that process to notice it hasn't been run in a while, it can call times periodically and compare the relative difference in elapsed time with the relative difference in scheduled user time (you would sum the tms_utime and tms_cutime fields if you want to count waiting for children as productive time, and you would sum in the tms_stime and tms_cstime fields if you count kernel time spent on your behalf to be productive time). For thread times, the only way I know of is to consult the /proc filesystem.

A high priority external process or high priority thread could externally monitor processes (and threads) of interest by reading the appropriate /proc/<pid>/stat entries for the process (and /proc/<pid>/task/<tid>/stat for the threads). The user times are found in the 14th and 16th fields of the stat file. The system times are found in the 15th and 17th fields. (The field positions are accurate for my Linux 2.6 kernel.)

Between two time points, you determine the amount of elapsed time that has passed (a monitor process or thread would usually wake up at regular intervals). Then the difference between the cumulative processing times at each of those time points represents how much time the thread of interest got to run during that time. The ratio of processing time to elapsed time would represent the time slice.

One last bit of info: On Linux, I use the following to obtain the tid of the current thread for examining the right task in the /proc/<pid>/task/ directory:

tid = syscall(__NR_gettid);

I do this, because I could not find the gettid system call actually exported by any library on my system, even though it was documented. But, it might be available on yours.

How to find file in each directory with the highest number as filename?

6 votes

I have a file structure that looks like this

./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin

and I would like to find the file path to the .bin file in each directory which have the highest number as filename. So the output I am looking for would be

./501.res/1.bin
./503.res/2.bin
./504.res/1.bin

The highest number a file can have is 9.

Question

How do I do that in BASH?

I have come as far as find .|grep bin|sort

What about using awk? You can get the FIRST occurrence really simply:

[ghoti@pc ~]$ cat data1
./501.res/1.bin
./503.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' data1
./501.res/1.bin
./503.res/1.bin
./504.res/1.bin
[ghoti@pc ~]$ 

To get the last occurrence you could pipe through a couple of sorts:

[ghoti@pc ~]$ sort -r data1 | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort
./501.res/1.bin
./503.res/2.bin
./504.res/1.bin
[ghoti@pc ~]$ 

Given that you're using "find" and "grep", you could probably do this:

find . -name \*.bin -type f -print | sort -r | awk 'BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1' | sort

How does this work?

The find command has many useful options, including the ability to select your files by glob, select the type of file, etc. Its output you already know, and that becomes the input to sort -r.

First, we sort our input data in reverse (sort -r). This insures that within any directory, the highest numbered file will show up first. That result gets fed into awk. FS is the field separator, which makes $2 into things like "/501", "/502", etc. Awk scripts have sections in the form of condition {action} which get evaluated for each line of input. If a condition is missing, the action runs on every line. If "1" is the condition and there is no action, it prints the line. So this script is broken out as follows:

  • a[$2] {next} - If the array a with the subscript $2 (i.e. "/501") exists, just jump to the next line. Otherwise...
  • {a[$2]=1} - set the array a subscript $2 to 1, so that in future the first condition will evaluate as true, then...
  • 1 - print the line.

The output of this awk script will be the data you want, but in reverse order. The final sort puts things back in the order you'd expect.

Now ... that's a lot of pipes, and sort can be a bit resource hungry when you ask it to deal with millions of lines of input at the same time. This solution will be perfectly sufficient for small numbers of files, but if you're dealing with large quantities of input, let us know, and I can come up with an all-in-one awk solution (that will take longer than 60 seconds to write).

UPDATE

Per Dennis' sage advice, the awk script I included above could be improved by changing it from

BEGIN{FS="."} a[$2] {next} {a[$2]=1} 1

to

BEGIN{FS="."} $2 in a {next} {a[$2]} 1

While this is functionally identical, the advantage is that you simply define array members rather than assigning values to them, which may save memory or cpu depending on your implementation of awk. At any rate, it's cleaner.

Cross-platform unicode in C/C++: Which encoding to use?

6 votes

I'm currently working on a hobby project (C/C++) which is supposed to work on both Windows and Linux, with full support for Unicode. Sadly, Windows and Linux use different encodings making our lives more difficult.

In my code I'm trying to use the data as universal as possible, making it easy for both Windows and Linux. In Windows, wchar_t is encoded as UTF-16 by default, and as UCS-4 in Linux (correct me if I'm wrong).

My software opens ({_wfopen, UTF-16, Windows},{fopen, UTF-8, Linux}) and writes data to files in UTF-8. So far it's all doable. Until I decided to use SQLite.

SQLite's C/C++ interface allows for one or two-byte encoded strings (click). Ofcourse this does not work with wchar_t in Linux, as the wchar_t in Linux is 4 bytes by default. Therefore, writing and reading from sqlite requires conversion for Linux.

Currently the code is cluttering up with exceptions for Windows/Linux. I was hoping to stick to the standard idea of storing data in wchar_t:

  • wchar_t in Windows: Filepaths without a problem, reading/writing to sqlite without a problem. Writing data to a file should be done in UTF-8 anyway.
  • wchar_t in Linux: Exception for the filepaths due to UTF-8 encoding, conversion before reading/writing to sqlite (wchar_t), and the same for windows when writing data to a file.

After reading (here) I was convinced I should stick to wchar_t in Windows. But after getting all that to work, the trouble began with porting to Linux.

Currently I'm thinking of redoing it all to stick with simple char(UTF-8) because it works with both Windows and Linux, keeping the fact in mind that I need to 'WideCharToMultiByte' every string in Windows to achieve UTF-8. Using simple char* based strings will greatly reduce the number of exceptions for Linux/Windows.

Do you have any experience with unicode for cross-platform? Any thoughts about the idea of simply storing data in UTF-8 instead of using wchar_t?

UTF-8 on all platforms, with just-in-time conversion to UTF-16 for Windows is a common tactic for cross-platform Unicode.

Simulating /dev/random on Windows

5 votes

I'm trying to port python code from linux to windows right now. In various places random numbers are generateted by reading from /dev/random. Is there a way to simulate /dev/random on Windows?

I'm looking for a solution that would keep the code useable on linux...

If you are using Python, why do you care about the specific implementation? Just use the random module and let it deal with it.

Beyond that, (if you can't rely on software state) os.urandom provides os-based random values:

On a UNIX-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom.

(Note that random.SystemRandom provides a nice interface for this).

If you are really serious about it being cryptographically random, you might want to check out PyCrypto.