Tuesday, December 25, 2007

xdebug: Function Tracing using Profiling

Every now and then you would be looking at the codes of others. Thanks to open-source, I have had many such instances and had really hard time following the code. Take mediawiki, drupal, owl, dotproject just to name a few. Xdebug allows you to log all function calls, including parameters and return values to a file in different formats. The output log format can be customized. Like I may need the function calls only, just to trace the order of function calls.

The xdebug.org has a page on function trace.

It's not a good practice to write the configuration parameters in php.ini or xdebug.ini file (you may also do that, but you don't want to trace other applications). These ini files set global configuration. It's no wonder that xdebug makes everything slow. So be careful. If you are wondering about xdebug.ini, it's a separate config file for xdebug extension. rather than writing to php.ini file. See my previous post on building xdebug.

.htaccess comes to aid, when you don't want to write either to your source code or .ini file.

Create .htaccess file in the directory of the application you want to trace.
php_value xdebug.auto_trace 1
php_value xdebug.trace_format 0
php_value xdebug.trace_output_dir /var/www/trace
php_value xdebug.trace_options 1
See the meaning of above configuration from here. There are other parameters if you need to see the parameters passed and the return values.

Make sure that the directory /var/www/trace is writable by your web-server.

Load the php script in your browser. You will see a new file has been created at the output location.

The file size can get very huge. Like in the case of mediawiki code, loading index.php creates a trace file of size 3M.

TRACE START [2007-12-25 05:24:23]
0.0018 68304 -> {main}() /var/www/testwiki/index.php:0
0.0037 87332 -> require_once(/var/www/testwiki/includes/WebStart.php) /var/www/testwiki/index.php:38
0.0039 87460 -> str_replace() /var/www/testwiki/includes/WebStart.php:10
0.0041 87800 -> ini_get() /var/www/testwiki/includes/WebStart.php:19
0.0043 87800 -> microtime() /var/www/testwiki/includes/WebStart.php:51
0.0045 87860 -> function_exists() /var/www/testwiki/includes/WebStart.php:53
0.0047 87880 -> getrusage() /var/www/testwiki/includes/WebStart.php:54
0.0050 89420 -> ini_set() /var/www/testwiki/includes/WebStart.php:59
0.0052 89420 -> define() /var/www/testwiki/includes/WebStart.php:66
0.0057 91252 -> require_once(/var/www/testwiki/StartProfiler.php) /var/www/testwiki/includes/WebStart.php:69
0.0059 91252 -> dirname() /var/www/testwiki/StartProfiler.php:3
0.0071 101736 -> require_once(/var/www/testwiki/includes/ProfilerStub.php)
.....
.....
2.1464 8033332 -> strlen() /var/www/testwiki/includes/OutputHandler.php:14
2.1466 8033332 -> wfDoContentLength() /var/www/testwiki/includes/OutputHandler.php:14
2.1467 8033332 -> headers_sent() /var/www/testwiki/includes/OutputHandler.php:96
2.1930 74168
TRACE END [2007-12-25 05:24:37]

This file is 27028 lines long. Huge, isn't it? The file looks simple, nonetheless long. It should give a good idea of what functions are being called.

I just found another convenient way of function tracing by using the profiling feature.

Xdebug's built-in profiler allows you to find bottlenecks in your script and visualize those with an external tool such as KCacheGrind or WinCacheGrind.

While viewing the profiler output, I realized that it's an excellent way of viewing the tracing function calls.

To enable profiler, replace the content of .htaccess file with the following lines
php_value xdebug.profiler_output_dir /var/www/trace
php_value xdebug.profiler_enable 1
Load the php script from browser. You will see the file cachegrind.out file being created. There are two such files in my case. I wonder why.

Now download WinCacheGrind that beautifies the above created file in a very convenient manner.

Load the cachegrind.out file in WinCacheGrind. You will notice that the left pane displays the function calls, and included file in a same sequence as that of function trace output file above.


You may browse through the tree of function calls and probably understand the code flow in much less time. You will also see that the php internal function are labelled as php::functionname so you can just concentrate on user functions. Happy hacking.

building xdebug in linux machine

Though the instructions at xdebug.org were good, I still had a little hard time getting through the details. It took me a while to understand and get xdebug running in my linux machines. I tried with ubuntu gusty server (7.10) and fedora core 5, and had xdebug compiled and built in both.

You may also want to see the instruction at xdebug.

First download xdebug source from xdebug.org
I prefer wget.
#wget http://www.xdebug.org/files/xdebug-2.0.2.tgz

Untar tgz file and go into the xdebug-2.0.2 folder
#tar -xzf xdebug-2.0.2.tgz
#cd xdebug-2.0.2

To build the php extension, you need several binaries like phpize, php-config (part of php-dev package). The chances are your system may not have this particular package.

#apt-get install php5-dev (in ubuntu)
#yum install php5-dev (in fedora)

will automate the nitty-gritty job of solving dependencies. Some may even ask to upgrade mysql-server along with php depending upon the versions in your machine. Make sure your data are backed-up, before you say yes to install and upgrade dependencies packages.

After the installation, phpize should be available. phpize is a shell script to prepare PHP extension for compiling. Believe me I couldn't find the purpose of phpize anywhere in the internet. Maybe I haven't searched enough :(
Run phpize in your xdebug source directory. phpize generates several files, necessary for compiling, building the xdebug extension.

#phpize
will give the following output
Configuring for:
PHP Api Version: 20041225
Zend Module Api No: 20050922
Zend Extension Api No: 220051025
You will see that 'configure' script and several others are created.

#./configure --enable-xdebug

See the install instruction, if you get any errors. I didn't run into any. Hope you won't too.

#make
will build the extension xdebug.so in modules directory.

Move the extension to any folder you want. I moved to /opt/phpmodules/xdebug.so

You may either put the extension loading statement in php.ini or you may create a new ini file (xdebug.ini) in php.d directory. The configuration files will be automatically read from this directory (/etc/php5/conf.d in my case) and the extensions get loaded.

I wrote the following line to xdebug.ini
zend_extension="/opt/phpmodules/xdebug.so"
restart the apache server.

Load phpinfo (write a php script that calls phpinfo()) from the browser. If everything goes fine, you should see that xdebug module and zend logo in the browser.

#php -m
will also show xdebug under "PHP Modules" and "Zend Modules".

Congratulation. xDebug is successfully built and installed.

Monday, December 17, 2007

find and delete files at once... the shortcut

I was trying to delete a list of files with a character '-' in their filenames from a directory, containing lots of other files. Thanks to linux, you get everything you need here.

#find ./directorypath -regex '.*-.*'
lists all the files/directories containing '-' anywhere in the filename or directory name.
-regex allows to give any pattern to do powerful custom search.

To list only the files, add -type f
#find ./directorypath -type f -regex '.*-.*'

To delete all the files/directories with any pattern, just add command to the above line as shown below
#find ./directorypath -regex '.*-.*' -exec rm -rf {} \;

-exec rm -rf {} \;
executes rm -rf (remove forcefully and recursively in case of directory). The files/directories found by find command is passed to the rm -rf command via {}, which deletes them. \; marks the end of the command.

I suggest you first move all the files to some directory, before deleting them, just incase you don't accidentally delete some important files.

iptables... an easy way

I have been trying to open a trivial MySQL port in the fedora server box - just wanted to use desktop MySQL Client (like freeware EMS SQL Manager Lite for MySQL) rather than ubiquitous phpMyAdmin. I never understood this perplexing iptables command and its rules.

The easy way is to do a "Copy and paste" of a line from /etc/sysconfig/iptables and changing the port number (3306 for mysql - default).

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3306 -j ACCEPT

Restarting the iptables service does the remaining job.

service iptables stop
service iptables start

Sunday, December 16, 2007

Dekiwiki running along with other apps

I finally managed to install and run Dekiwiki in my Ubuntu 7.10 (gutsy) server machine. The problem with DekiWiki is that it has to be installed in the root directory. If I have other apps serving by the server, then DekiWiki will just claim the entire apache server, depriving me from accessing other apps - even phpMyAdmin.

After having Dekiwiki working, I made a tweak so that dekiwiki is run in port 8080 and port 80 is used to serve my default DocumentRoot (/var/www).

Here's my directory structure
/var/www/
------------phpMyAdmin/...
------------dekiwiki/web/...
------------testapps/

I added the following lines in /var/www/ports.conf below with Listen 80

Listen 8080

Now the apache will also listen in a different port 8080

I changed the deki-apache.conf file, included with the dekiwiki application. I just made sure that DocumentRoot is different when served in port 8080.

Now dekiwiki is accessed from http://192.168.0.45:8080/
While other apps can be accessed from http://192.168.0.45/

My hours of labor in Dekiwiki seem to get paid off.
<VirtualHost *:80>
DocumentRoot "/var/www"
</VirtualHost>

<VirtualHost *:8080>
ServerName 192.168.0.45

ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log common

DocumentRoot "/var/www/dekiwiki/web"

RewriteEngine On
RewriteCond %{REQUEST_URI} ^/$
RewriteRule ^/$ /index.php?title= [L,NE]

RewriteCond %{REQUEST_URI} !/(@api|editor|skins|config)/
RewriteCond %{REQUEST_URI} !/(redirect|texvc|index|Version).php
RewriteCond %{REQUEST_URI} !/error/(40(1|3|4)|500).html
RewriteCond %{REQUEST_URI} !/favicon.ico
RewriteCond %{REQUEST_URI} !/robots.txt
RewriteCond %{QUERY_STRING} ^$ [OR] %{REQUEST_URI} ^/Special:Search
RewriteRule ^/(.*)$ /index.php?title=$1 [L,QSA,NE]

# deki-api uses encoded slashes in query parameters so AllowEncodedSlashes must be On
AllowEncodedSlashes On

# mod_proxy rules
ProxyPass /@api http://localhost:8081 retry=1
ProxyPassReverse /@api http://localhost:8081
SetEnv force-proxy-request-1.0 1
SetEnv proxy-nokeepalive 1
</VirtualHost>

Monday, January 29, 2007

Python: HTML tables to Mediawiki converter

Some ideas and codes for the html2wiki are borrowed from html2csv converter. It reads any file and converts tables, if present, to wiki format. The code, in colored, can also be viewed from the code snippets.

import HTMLParser, re, sys
class html2wiki(HTMLParser.HTMLParser):
def __init__(self):
HTMLParser.HTMLParser.__init__(self)
self.wiki = '' # The Wiki text
self.wikirow = '' # The current Wiki row of table being constructed from HTML
self.inTD = 0 # Used to track if we are inside or outside a <TD>...</TD> tag.
self.inTR = 0 # Used to track if we are inside or outside a <TR>...</TR> tag.
self.re_multiplespaces = re.compile('\s+') # regular expression used to remove spaces in excess
self.rowCount = 0 # output row counter.
self.rowspan = ''
self.colspan = ''
self.linebreak = '<br>'
self.data = ''
self.prop = ''

def handle_starttag(self, tag, attrs):
if tag == 'table': self.start_table()
elif tag == 'tr': self.start_tr()
elif tag == 'td': self.start_td(attrs)

def handle_endtag(self, tag):
if tag == 'table': self.end_table();
elif tag == 'tr': self.end_tr()
elif tag == 'td': self.end_td()

def start_table(self):
self.wiki += '{| border=1' + self.linebreak
self.wiki += '|-' + self.linebreak

def end_table(self):
self.wiki += '|}' + self.linebreak

def start_tr(self):
if self.inTR: self.end_tr() # <TR> implies </TR>
self.inTR = 1

def end_tr(self):
if self.inTD: self.end_td() # </TR> implies </TD>
self.inTR = 0
if len(self.wikirow) > 0:
self.wiki += self.wikirow
self.wiki += '|-' + self.linebreak
self.wikirow = ''
self.rowCount += 1

def start_td(self, attrs):
if not self.inTR: self.start_tr() # <TD> implies <TR>
self.data = ''
self.prop = ''
self.rowspan = ''
self.colspan = ''
for key, value in attrs:
if key == 'rowspan':
self.rowspan = value
elif key == 'colspan':
self.colspan = value
self.inTD = 1

def end_td(self):
if self.inTD:
self.wikirow += '| ' + self.prop + self.re_multiplespaces.sub(' ',self.data.replace('\t',' ').replace(self.linebreak,'').replace('\r','').replace('"','""'))+ self.linebreak;
self.data = ''
self.inTD = 0

def handle_data(self, data):
if self.inTD:
if data.strip() != '':
self.prop = ''
if self.rowspan != '':
self.prop = ' rowspan = '+self.rowspan
if self.colspan != '':
self.prop += ' colspan = '+self.colspan
if self.prop:
self.prop += ' | '
self.data += data

if __name__ == '__main__':
parser = html2wiki()
if len(sys.argv) == 2:
in_file = open(sys.argv[1],"r")
text = in_file.read()
parser.feed(text)
in_file.close()
print parser.wiki
else:
print 'Argument - filename required'
Since I need a web interface for users, I don't want to create another similar app in php nor do I want to write cgi in python. So I wrote another tiny php script and exploit the python script. I have used tinymce so that I can now just copy and paste html tables directly to the edit box and do the conversion easily.

<?
if($_POST['submit']) {
if(trim($_POST['html'])) {
$input = stripslashes(trim($_POST['html']));

$filename = 'uploads/'.date('Ymdhis').'.txt';
$fp = fopen($filename, 'w');
fwrite($fp,$input);
fclose($fp);
$ret = exec("python html2wiki.py $filename", $output, $retval);
$output = implode("\n",$output);
unlink($filename);
}
}
?>
<script language="javascript" type="text/javascript" src="/lib/tinymce/jscripts/tiny_mce/tiny_mce.js"></script>
<script language="javascript" type="text/javascript">
tinyMCE.init({
theme:"simple",
mode : "textareas"
});
</script>

<form name='converter' method='post'>
<input type='submit' value = 'Convert Html2Wiki >>' name='submit'><br>
<table>
<tr><td><textarea name='html' cols='50' rows='40'><?=$input?></textarea></td>
<td><textarea name='wiki' cols='50' rows='40'><?=$output?></textarea></td>
</tr></table>
</form>


The output is something like shown below.
Image Hosted by ImageShack.us

Friday, January 12, 2007

Mediawiki: Reusing the same code for multiple Wikis

Sometimes you may need to install separate wikis for in the same server. One inefficient solution (in terms of time and space) is to install everytime you need Wiki. Other solution is to reuse the wiki code for all. But you need to install database separately for each -for this, you first need to dump fresh database right after first installation so that it can be used for other potential Wikis that may need to be installed for other purpose in the same server.



This has been tried for Mediawiki 1.6.7 only.



1. First install Mediawiki in any folder (say wikicode).

2. Dump/Export mysql database from phpMyAdmin.

3. Create a new folder in document root (say testwiki).

4. Copy index.php and LocalSettings.php from wiki installed directory (wikicode, in our case) to the newly created folder.

The folder structure should be like this.

public_html/

wikicode/

mediawiki folders/files

testwiki/

index.php

LocalSettings.php



5. Make necessary changes in the LocalSettings.php, as shown below. The original and altered code excerpts are shown.



Original Localsettings.php right after installation.



[...]

if( defined( 'MW_INSTALL_PATH' ) ) {

$IP = MW_INSTALL_PATH;

} else {

$IP = dirname( __FILE__ );

}



$path = array( $IP, "$IP/includes", "$IP/languages" );

set_include_path( implode( PATH_SEPARATOR, $path ) );



[...]

$wgSitename = "Demowiki";



$wgScriptPath = "/wikicode";

$wgScript = "$wgScriptPath/index.php";

[...]

$wgStylePath = "$wgScriptPath/skins";

$wgStyleDirectory = "$IP/skins";

[...]

$wgUploadPath = "$wgScriptPath/images";

$wgUploadDirectory = "$IP/images";

[...]


Changed Localsettings.php in the new folder (testwiki, in our case)



[...]

if( defined( 'MW_INSTALL_PATH' ) ) {

$IP = MW_INSTALL_PATH;

} else {

$IP = dirname( __FILE__ );

}

$IP = $_SERVER['DOCUMENT_ROOT'].'/wikicode/';

$path = array( $IP, "$IP/includes", "$IP/languages" );

set_include_path( implode( PATH_SEPARATOR, $path ) );



[...]

$wgSitename = "Demowiki";



//$wgScriptPath = "/wikicode";

$wgScriptPath = "/testwiki";

$wgScript = "$wgScriptPath/index.php";

[...]

//$wgStylePath = "$wgScriptPath/skins";

$wgStylePath = "http://localhost/wikicode/skins";

$wgStyleDirectory = "$IP/skins";

[...]

$wgUploadPath = "$wgScriptPath/images";

//$wgUploadDirectory = "$IP/images";

$wgUploadDirectory = $_SERVER['DOCUMENT_ROOT'].'/demowiki/images';



6. Change database information accordingly.



Blue color represents the added lines

Green color represents the commented lines in the modified LocalSettings.php

[...] represents some codes, those are removed for brevity.



Now you can copy the folder (testwiki) as many times as you like, install databases and make necessary changes in the LocalSettings.php. This should allow you to install Wiki with very little effort from your part.



Updates:

Sorry I didn't realize that there is also a small addition in index.php, if any of you have been trying to follow this. It's funny that as I was trying to follow this myself, i realize that it's not working yet.



Original index.php



require_once( './includes/Defines.php' );


Changed index.php



set_include_path(get_include_path().PATH_SEPARATOR.$_SERVER['DOCUMENT_ROOT'].'/wikicode/');

require_once( 'includes/Defines.php' );