Detta är ett uppsatsförslag hämtat från Nationella Exjobb-poolen. Klicka här för att komma tillbaka till samtliga exjobbsförslag.
Executing Statistical Programs from SQL
The department of Medical Epidemiology and Biostatistics (ki.se/meb) at Karolinska Institutet performs advanced analyses of big data volumes for epidemiological research. The amount of the data and the complexity of the analyses require applying the state-of-the-art database technologies. Furthermore, new methods have to be developed in addition. The department of Medical Epidemiology and Biostatistics gives you an opportunity to get practical experience with relational database management systems and learn or develop state-of-the-art methods in data management. You will get experience, which can be used in the area of large scientific and medical databases and also in the area of advanced business intelligence.
Data analyses for epidemiological research are specified as statistical programs, which are then executed in statistical software such as SAS (www.sas.com) and R (www.r-project.org). On another side, data for the analyses are often stored in a relational database management system (RDBMS) such as DB2 from IBM (www.ibm.com/db2), Oracle (www.oracle.com/database), and SQL Server from Microsoft (www.microsoft.com/sql). RDBMSs provide efficiency in management of large data volumes, data protection and security, and high level declarative language SQL for describing data management queries. The current approach for combining an RDBMS and statistical software is to implement access to an RDBMS in a statistical program. The statistical program retrieves data for analysis from an RDBMS, and then performs the statistical data analysis and additional data management in the statistical package.
In this project you will investigate a different approach to combine an RDBMS and statistical analyses. You will develop interface functions, which will extend SQL and will call functions or programs in statistical software. Thus it will be possible to write an SQL query, which will perform both data management and data analysis. During execution of such query the RDBMS will invoke programs or functions in the statistical software, and then it will return the analysis result as result of the query or will combine the analysis result with results of other data analyses and data management.
It is proposed that you will work with the database management system DB2 from IBM and the statistical software SAS. You will implement foreign functions or a wrapper, which are standard mechanisms in DB2 to interface to other software packages, to SAS. You will test your implementation with actual research analysis over research data stored in DB2. You will also investigate connecting DB2 to other statistical packages. Your result is going to be a report written in English in addition to the implementation.
It is essential that you have good programming skills and knowledge in mathematics. It is a plus if you have taken a database course and have practical experience with an RDBMS and SQL.
The project requires coming to the department (Stockholm/Solna) every day during implementation phase. All conversations and discussions will be held in English. Note, that this is an academic project.
Informationen om uppsatsförslag är hämtad från Nationella Exjobb-poolen.