|
In this web site we propose a web service for doing "linear multiple regression analysis" in real-time, online and on demand.
'What is "linear multiple regression analysis"?' you ask.
Well, suppose you are an agricultural researcher and you are doing an experiment with apples.
Your goal is to predict the amount of apples a tree will yield.
The yield typically depends on many factors.
To keep it simple, let's say that the yield depends on three things:
- average sunshine throughout the growing season,
- average temperature throughout the season, and
- amount of fertilizer given to each tree.
So you do some research, and collect data about various apple trees around the world.
For each apple tree in your survey you collect data on the factors (making sure that you measure the data in a consistent fashion).
Because you are doing this "formally", you label the three types of measurement as follows:
- x1 - Average sunshine per day,
- x2 - Average temperature, and
- x3 - Amount of fertilizer (kilos).
You also collect the yield for each apple tree in the survey, and you label the measurement as follows:
- y - Number of apples.
You send out an envelop full of forms to each farmer and you ask them to record x1, x2, x3 and y for each of their trees.
Then you sit back and wait.
A week or so later you receive a letter from each farmer containing his completed forms.
When you counted the total number of forms from all the farmers, you discovered that a total of 50 trees were measured.
You spend an afternoon entering the data into your spreadsheet.
Your goal is to use all this valuable data to predict how much fertilizer to add to each tree in order to maximize the yield (y), so that next year the farmers do better.
Of course the answer is not as simple as "put on masses of fertilizer".
Your experience tells you that you need to "tune" the amount of fertilizer depending on the sunshine and temperature.
Now because you are clever, you know/guess that there is a formula of the form:
y = a1*x1*x1 + a2*x2 + a3*x3*x3,
that gives a good prediction of the number of apples y, where a1, a2 and a3 are numerical coefficients or weights (that reflect the relative influence of the three independent measurements on the number of apples).
However you don't know the values of the coefficients a1, a2 and a3.
You don't know the values because it is difficult to see "by eye" any pattern in the data the farmers recorded.
However, there are some relatively simple ways to calculate the "best" values of a1, a2 and a3, given the 50 sets of data and the form of the equation.
The body of mathematics used to do the calculation of the values of a1, a2 and a3 is called multiple linear regression analysis.
There are hundreds of computer programs available to do the calculations; they range from the completely free to the very expensive.
So what is new about the idea of this web site?
We can answer these questions by continuing to use the apple tree analogy.
Instead of maintaining the software and organizing the forms and posting them off to the farmers etc etc, you subscribe to this web site (for a modest fee).
You give each farmer a username and password for your account on this web site, and they log in and enter the data for their trees directly, using web-based forms.
From the comfort of your office, you monitor the progress of the data entry by looking at the web site, and when all the farmers have entered their data, you press the "go" button, and the server performs the mathematics and displays the "best" values of a1, a2 and a3.
You can also use the values of a1, a2 and a3 to do "what if" calculations.
You enter some predicted average sunshine hours, average temperature and kilos of fertilizer for next season, and the web site will calculate the predicted yield.
Your data is held in a secure and private area of the central server, so when you are attending the cider-growers union annual general meeting in Montreal, you don't have to take your laptop.
You can just go into a cyber cafe and run some predictions after logging in to the web site.
You and your company don't have the hassle of maintaining software on your computers; that's all done by our team of mathematicians at the central server.
As a new algorithm or a new service is implemented on the web site, you get immediate access to it and you can use it as part of your account.
Every business and every government does "multiple linear regression analysis".
What's new about this idea is that the data can be entered by colleagues around the world via the web site.
Do you think the idea can fly?
Please e-mail us at: dominique@scrapeworld.com and tell us what you think.
|