Large scale databases with PHP/Mysql

by @ 8:38 am on December 20, 2007.  Filed under Mysql, PHP

I’m currently faced with a paradox. I have a huge database (one table is more than 84 million records) that requires daily updates of about 2 m records and probably daily inserts of about 500k or more. Currently, I am trying to achieve this with Mysql however I am finding that the server I’m hosting the application is far from ideal. So my dilemma is, do I spend my hard earned money on a beefy server (>$400 a month) or do I build one and keep it here at my house on my Verizon Fios fiber connection. If I do the first option, it eats into my hard earned cash. If I do the second option, I have to figure out how to connect my home box to my live website in a reliable manner.

The downside to the fios connection is, I do not get a static IP. Also, I have a suspicion that they are blocking incoming ports.

I’ve decided to take the latter route, build a machine for home that will host the database on some RAID1 drives. I have a few SCSI 320 drives as well, so a RAID 10 setup would be optimal. But my first attempt is going to be on an SATA Raid card made by Silicon Image. The complexities of this application are too numerous to go into, but the general goals are to parse large text files, analyze them and store the results in the database. The issue lies in the difficulty of applying these massive updates to Mysql in a way that won’t take half a day to process. It’s a two part equation. The server that is doing the data analysis has to be respectable in terms of hardware/cpu/memory, however, the live web server which also hosts a “copy” of the database for the website to actually use. Uploading the daily updates to the live server will be the bottleneck in terms of system performance. Tests will tell if my RAID on SATA will be of any benefit. If not, I will switch to my SCSI RAID and work from there. I’ve been searching Google for answers on the topic of large scale data warehouses and data mining but I am coming up empty handed. Got any ideas?

Use a code generator to create PHP applications - DBInspector

by @ 7:49 am on December 5, 2007.  Filed under PHP

Editors Note: I wrote this article originally for The Code Project. Since it is such a useful program, I am posting it here as well. Although the examples shown pertain to a VisualBasic.net application, it is also a good solution for PHP programmers. The software uses code templates to generate otherwise redundant code. For example, if you are rewriting an application and need to write a data access layer, you will find this software very useful. DBInspector can reverse engineer MySQL, Postgresql, and Microsoft SQL Server databases. You can then create templates for generating Smarty templates, PHP pages, module code, and database access classes. I have used this software on large scale commercial projects from PHP to Java and VB.NET.

Making the case for code generation

DBInspector - Doin the dirty work for you! Download: dbinspector-1-0.zip (13.0 MB)

As any programmer of even modest abilities can attest, writing code to implement a system can be a redundant and laborious task. Often we wonder, why must we type essentially the same exact code over and over again, from project to project. We wonder this even as we are writing the program code. We purchase expensive Integrated Development Environments which effectively are nothing more than oversized Notepads. These wonderful IDEs allow us to develop and test our code, maybe even design it visually, but seldom do they actually make our jobs easier. In order to make my job easier, a tool basically has to do my job for me, at my command.

This is the premise of Code Generation tools. To take an 80/20 approach to producing programs. Unfamiliar with the 80/20 rule?

Pareto’s Principle - The 80-20 Rule

In 1906, Italian economist Vilfredo Pareto created a mathematical formula to describe the unequal distribution of wealth in his country, observing that twenty percent of the people owned eighty percent of the wealth. In the late 1940s, Dr. Joseph M. Juran inaccurately attributed the 80/20 Rule to Pareto, calling it Pareto’s Principle. While it may be misnamed, Pareto’s Principle or Pareto’s Law as it is sometimes called, can be a very effective tool to help you manage effectively.

Where It Came From

After Pareto made his observation and created his formula, many others observed similar phenomena in their own areas of expertise. Quality Management pioneer, Dr. Joseph Juran, working in the US in the 1930s and 40s recognized a universal principle he called the “vital few and trivial many” and reduced it to writing. In an early work, a lack of precision on Juran’s part made it appear that he was applying Pareto’s observations about economics to a broader body of work. The name Pareto’s Principle stuck, probably because it sounded better than Juran’s Principle.

As a result, Dr. Juran’s observation of the “vital few and trivial many”, the principle that 20 percent of something always are responsible for 80 percent of the results, became known as Pareto’s Principle or the 80/20 Rule. You can read his own description of the events in the Juran Institute article titled Juran’s Non-Pareto Principle.

What It Means

The 80/20 Rule means that in anything a few (20 percent) are vital and many(80 percent) are trivial. In Pareto’s case it meant 20 percent of the people owned 80 percent of the wealth. In Juran’s initial work he identified 20 percent of the defects causing 80 percent of the problems. Project Managers know that 20 percent of the work (the first 10 percent and the last 10 percent) consume 80 percent of your time and resources. You can apply the 80/20 Rule to almost anything, from the science of management to the physical world.
(more…)

So you want to learn PHP…

by @ 11:18 pm on December 3, 2007.  Filed under PHP

While browsing the shelves at the local bookstore I made a mental note: There aren’t many PHP books. I picked up a few and took a look inside. They were ok, but, right off the bat, I felt like they were teaching the readers bad habits. I think that the most important time for a programmer, or a newbie programmer, to form good habits is when he writes his first few lines of code. I thought I’d take a few minutes and put down some of my pet peeves and see what others thought.

PHP is not just a “scripting language.”

Let’s get that out of the way. PHP is a full blown Web Application Development language. It can do anything a skilled programmer wants it to do. It benefits from being easy to read, easy to understand, and has roots in C, one of the grandaddies of programming languages. Although syntacticly similar, PHP is not C and doesn’t need to be. The point is, once you learn PHP, you can more easily grasp other languages such as Java, and C, etc.

Separate your laundry, please

Although you can mix PHP in HTML, SQL, etc, it doesn’t mean you should. Try to treat each layer as a separate realm. It’s ok to use PHP in HTML for presentation purposes, such as displaying the date, etc. But you should not place your business rules, database queries, etc, in your HTML files. It leads to horrible problems and bad habits. Yes, I concede, it still produces a working program, but you will thank yourself later if you put some discipline and structure into your coding style early on.

Create functions for database queries

This is a very important lesson. Do not be tempted to write out a quick and dirty mysql_query statement in the middle of your page. You will find yourself writing the same queries over and over. As soon as you change the database structure, you have many pages to fix. Write a function library for all of your queries, this leaves you with one file to modify, and, gives you the bonus of writing reusable functions.

Find a framework, learn the framework, the framework is your friend

There are countless “application frameworks” that help you create programs faster, better, smarter. Zend Framework, CakePHP, Drupal just to name a few. Learn how they solve common problems. Writing code is usually writing the same code over and over. Best to use someone else’s framework because they have already addressed many of the problems you will face.

PHP is a powerful programming language. With it you can produce great software, connect it to popular database engines like Mysql and Postgresql, and it has the added benefit of being installed on almost every webserver on the planet. PHP won’t let you down if you give it enough time and patience to learn PHP programming the right way. There are thousands of top notch software in use today with PHP that were written the wrong way, but they still function. It’s just a major challenge in maintaining the spaghetti code. Make your life easier with PHP, learn the right way, early on.

This is just a few ideas to get the new PHP developer on his feet. I’m sure there are many more suggestions, can you think of any off the top of your head?

web·pit n. a place of discovery; a repository of information; where coders come for enlightenment; a programmers diary.

internal links:

categories:

search:

archives:

December 2007
M T W T F S S
    Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

other:

Great Places

  • websiterepairguys.com
    Millions of websites need fixing. Here's the man for the job!
  • mydnsmanager.com
    Free DNS hosting, domain parking and domain name research tools
  • phpfreaks.com
    Excellent resource for all things PHP
  • boonex.com
    Boonex makes great, FREE, social networking software called Dolphin Smart Community Builder

About Me

My name is Mark Williamson and I am somewhat of a PHP freelancer and all around hacker. I have many years of experience in software development, covering a wide variety of languages, databases, and functional areas. From website content management systems, medical billing, website load testing, billing automation, and sales automation, I can pretty much say I've seen a few million lines of code. I am usually available for after-hours contract work (I have a full time day job). If you need assistance implementing ideas for your website, contact me and I'll give you an honest opinion on what it would take. You can see my other website at WebsiteRepairGuys .com (link at the top of this column).
All content Copyright (c) 2008, Mark Williamson