Home
 

Translate to

Login or Register

Sponsored Links

SPSS Clementine: The data mining made straight forward PDF Print E-mail
Written by Massoud Toussi   
Wednesday, 13 August 2008 21:02

SPSS Clementine is one of the very first softwares I tested for data mining. In fact, I began data mining with Clementine during at a time where not more than a few solutions were available. One can consider SPSS Inc., the company that produced Clementine, a pioneer in this regard.

Although I don't like commercial software in general, nor I like to write about commercial software on this website, it is by far the most straight forward and easy to use software that I found for effecient data mining. Let's revise its advantages:

1. There is a beautiful and ergonomic graphical user interface which does not bug every now and then, like many other beautiful GUIs do.

2. The software is shipped by a predesinged method -CRISP-DM- that I find both simple and efficient. The software do not impose the CRISP-DM method to the user. Rather, it helps the beginners with a good method and it can be considered a good educational initiative from this point of view. 3. The software proposes connections to different kinds of data source including SPSS exported files, excel, text, databases, etc.

4. The GUI imploys data flows for guiding the use of different subdivisions of the sample data through the experience. The advantage of data flows is that one do not need to reconstruct all the steps necessary for the preparation of data. They enable the user to view all experimentations on the same screen for further comparison of models.

5. In terms of modeling, it proposes a wide arsenal of modeling methods which include neural networks, linear and logistic regression for prediction; rule induction, regression and decision trees for classification; Kohonen networks, K-means and Two Step clustering for segmentation; APRIORI, GRI, and GRAMA for association; and Capri and rule induction for time sequence modeling techniques.

6. While the number of proposed models is not as high as those proposed by RapidMiner, they seem to me more efficient because of the extensive use of commercial and enhanced models. For example the C5.0 algorithm in Clementine, is an enhancement of Quinlan's C4.5 algorithm for decision trees which accepts both numeric and polynominal variables unlike many other non-commerical decision tree models.

If you work with different kinds of data, and you need a higher degree of reactivity in your data mining projects, or if you want to teach data mining methods in a university, or if you simply do not enough computer knowledge to code in R or SAS (in fact, if you prefere to code, you can do it with Clementine also), Clementine would definitely be a good choice for you.



 
RapidMiner: The eye of mining PDF Print E-mail
Written by Massoud Toussi   
Sunday, 03 August 2008 21:07
RapidMinerRapidMiner is a dual licence (community and enterprise editions) open-source data mining solution due to the combination of its leading-edge technologies and its functional range. Applications of RapidMiner cover a wide range of real-world data mining tasks. These are pretty words that RapidiMiner team have placed on their site. Yet we must test the software before blieve them: what I did for you here!

I've installed on a laptop Centrino Duo with 1MB of RAM with Windows XP. The installation was done properly without warning or error. The graphical interface is beautiful and fits well in Windows environment (apparently it can also be installed on other systems, but the website did not explain what it meant by other systems). There is a very good and complete documentation. In short, while a free software, it is comparable to SPSS Clementin in its allure and its graphic presentation.

Compared to its functional richness, it exceeds some of the most expensive data mining software. It combines the features offered by Weka and some incorporated by Yale (its progenitor).

Among its friendly features, a menu which categorizes different statistical tests and data mining models in groups that are much like chapters of a good book on the subject.

Its modular archituecture and the existence of more than 400 plugins already show its success.

Negative aspects should also be considered: its user interface is quite uncommon and I had spent a few hours before being able touse it and have my early experiments. However, I think this is due to its original workflow, with which I was not used ot work. A second default, the level of manipulation of data files it is not yet to the point: I was forced several times to change the format of my data, because it could not import them.

Finally, I congratulate the team and the community, and strongly recommend using the software.


 
R or SAS, which one to choose? PDF Print E-mail
Written by Massoud Toussi   
Friday, 01 August 2008 20:23

R prgramming language is a free and open source language and working environment for statistical computing and data mining, which is distributed under GNU licence. It has become the preferred language of academics in the domain of biostatistics and bioinformatics. The growing number of contributed packages have provided R with numerous functionalities. Some serious project such as bioconductor project, have further increased its penetration in the domain of genomics and biotechnology. More over, R graph gallery is proabably the most comprehensive collection of data visualisation funtionalities which have ever existed in a free or commercial statistical software.

However, companies are still working with SAS System of SAS Institute, which is a software framework for data entry, retrieval, management ad mining. A friend of mine has recently obtained his PhD in bioinformatics, and for finding a job in a company, he finally had to obtain a SAS certificate, although he had done all of his thesis working with R. Do companies have to choose SAS instead of R for finding clients or making projects?

For companies the situation is a little different. Let's see why a company may prefer to choose R as its statistical analysis software?

  • It is available for no cost.
  • Functions' source codes are open.
  • There are more 
  • The software evolution is more rapid.
  • Communitiy and especially academic support.

And why a company may prefer to use SAS?

  • Its database is enhanced for large volumes of data.
  • Data management is easier.
  • The software is more user friendly.

If you are a company which manupulate data bases with mild to moderate turn over, R is a better choice for you. If you are a company with a large amount of data entry and data management, and thus with a large turn data over, you will need SAS. Enough documentation and support is now available in both cases. If you are already using SAS and you want to change for R to reduce your costs, do not forget to consider the cost of this migration, but in anyway such a change will reduce your costs.

If you want the software more for business analysis and less for biomedical computing, know that SAS is the leading company in the domain. However, a good R programmer can do also high quality business analysis with R.



 
Cuil is Cool but... PDF Print E-mail
Written by Massoud Toussi   
Tuesday, 29 July 2008 17:35

The new search engine: www.cuil.com (pronounce Cool) is made by a number of ex-Google employees and claims indexing  121,617,892,992 web pages, which is believed to be ten times more than what Google indexes today.

The search results are presented in a more "cool" layout with text and images. The user can decide for the number of columns (2 or 3), and there are a number of tabs which groups of results which are related to each other (which implies that the search engine is probably capable of generating an ontology). The styly interface can be presented also by some tabs.

The owners of the project claim that this search engine does not people's search history like Google does, and it works in a more honest and humanistic way, saving the privacy of internet surfers.

There is no apparent advertising; therefore, one may ask how they will be able to gain enough money to battle with the king of advertisers!

One can see a number of shortcomings even with the first round of tests: In comparison with Google, results seem to be less coherent to search "key words", especially if they are numerous, and especailly for languages other than English.This may be because of the mere automatic mechanism of Cuil's search engine, compared to humanly indexed pages of Google. Of course, with some learning "algorithm" they enhance the pertinence of search results. 

The preferences are rather few (only two preferences, one on safe search and the other on typing suggestions.

As the last word, Cuil presents an other way of viewing search results that has not been presented neither by Microsoft, nor by Yahoo. It merits to be tested: http://www.cuil.com !

 



 
The Community PDF Print E-mail
Written by Administrator   
Thursday, 12 October 2006 16:50

Got a question? The forum and the community which is being formed around CloseClinical well help you answer your questions on any topics related to health informatics

Do you want to show off your products or projects on medical informatics and bioinformatics? Go ahead, we have a section dedicated to this. You can advertise your product on CloseClinical website free of charge.

Do you want to join in?

If you are interested in medical informatics or bioinformatics, you can log up into our site. The membership is free and with no engagement. You WILL NOT receive any commercial or non commercial unwanted letters in your mail box.

The memebrship will allow you:

  • To be able to participae in forums.
  • To promote your research project, or your commercial product.
  • To talk about your services regarding the domain.
  • To  find partners for your projects or business plans.
  • To publish your ideas.
  • To find a job, or to publish a job announcement.
  • To be one of the previlaged participants of open source or commercial software reviews.


 
More Articles...
« StartPrev12NextEnd »

Page 1 of 2