Q1The Structured Query Language (SQL) is used to communicate with a database. According to the American National Standards Institute (ANSI), it is the standard language for relational database management systems (RDMS). SQL is used to update and/or retrieve data from a database. Some common relational database management systems that use SQL include Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. (What is SQL?, 2021).
One common SQL use is text mining and analysis. According to Asanka (2020), there are many challenges when it comes to modeling text data. Besides been unstructured, text data has large volumes of data. It may be difficult to analyze text data due to the different writing styles.Nevertheless, SQL can overcome this challenge.
Q2. Relational Database Management System (RDBMS) are used to store information in associated table formats with logic-based connections. Within these RDBMSs a coding language was needed to be able to work within and analyze information within these tables of data. Structured Query Language (SQL) is that coding language capable of dealing with these sets of tabled information. More importantly SQL allows for users to deal with RDBMSs as an interface with the data, and not extracting that data using software such as R (EMC Education Services, 2015, p. 389). SQL is also interesting in that it can not only pull data from and RDBMS but also push data into RDBMSs.
For in-database text analysis some helpful features make SQL really important. SQL has the ability to filter and sort information (such as online transactions) just as you would by using Microsoft Excel only with very large amounts of data (School of DATA, 2021, p. np). SQL uses queries to filter out information from selected tables. Such as filtering out the customer IDs from the last 30 days of online transactions. What is interesting most of SQL is that it is able to do queries of data within other queries. Like the above example of pulling the last 30 days of transactions, other information from other RDBMS table information can also be pulled such as products and quantities sold. Creating a table that can then be further analyzed. For text analysis specifically of the data in a RDBMS, SQL can use call functions such as “regular expressions” that query out specific data from the tables (EMC Education Services, 2015, p. 400). Using the SELECT function, meaning select or filter data, within a specific range from the table and the WHERE (the specific if/then code). Which leads to the ability to use the over all “window function” to produce table on customers spending habits, departments shopped, etc. from the tables within the RDBMS. This can then be developed into total sales moving averages, k-clusters of products bought together, giving a great deal of insight into customer shopping habits for further analysis and implementation or something as effective as targeted couponing, or targeted sales to customers. One thing that could be potentially troubling in using SQL and RDBMS is the ability to push and pull data while working the dataset as a whole. This could lead to opportunities where data could be corrupted or could result in data being accessed by unauthorized personnel such as data breaches from many wholesale retailers such as Target, and Walmart. Exposing many customers to potential fraud. This is something that while highlighted as a positive that SQL does not need to operate on pulled data such as R or Python, at least with R and Python there is some basic security that only a partial loss of data could accrue.