Import Data from Database
Besides indexing the articles through SOLR Update Request-Handler, you can also import data from database. This example will demonstrate how to import data from MySQL database. Assuming you have MySQL installed, let’s run the below simple schema in order to create a database/table with some records:
/* create database */
CREATE DATABASE itblogs DEFAULT CHARACTER SET utf8;
USE itblogs;
CREATE TABLE FILMS( id BIGINT NOT NULL AUTO_INCREMENT, name VARCHAR(50), initial_release_date DATE, direct_by_id BIGINT NOT NULL, last_modified TIMESTAMP NOT NULL, PRIMARY KEY (id) );
CREATE TABLE DIRECTED_BY( id BIGINT NOT NULL AUTO_INCREMENT, name VARCHAR(50), last_modified TIMESTAMP NOT NULL, PRIMARY KEY (id) );
CREATE TABLE FILMS_GENRE_MAPPING( films_id BIGINT NOT NULL, genre_id BIGINT NOT NULL, last_modified TIMESTAMP NOT NULL );
CREATE TABLE GENRE( id BIGINT NOT NULL AUTO_INCREMENT, name VARCHAR(50), last_modified TIMESTAMP NOT NULL, PRIMARY KEY (id) );
INSERT INTO DIRECTED_BY(name,last_modified) VALUES (‘Peter Pan’,now()); INSERT INTO DIRECTED_BY(name,last_modified) VALUES (‘Mary Pan’,now()); INSERT INTO DIRECTED_BY(name,last_modified) VALUES (‘May Pan’,now());
INSERT INTO FILMS(name,initial_release_date,direct_by_id,last_modified) VALUES (‘Movie 1′,’2016-01-01’,1,now()); INSERT INTO FILMS(name,initial_release_date,direct_by_id,last_modified) VALUES (‘Movie 2′,’2016-01-02’,2,now()); INSERT INTO FILMS(name,initial_release_date,direct_by_id,last_modified) VALUES (‘Movie 3′,’2016-01-03’,3,now());
INSERT INTO GENRE(name,last_modified) VALUES (‘Animation’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘Short Film’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘Adventure Film’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘Action Film’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘War Film’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘Superhero Film’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘Drama’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘Romance Film’,now()); INSERT INTO GENRE(name,last_modified) VALUES (‘Teen Film’,now());
INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (1,1,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (1,2,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (1,3,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (1,4,now());
INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (2,3,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (2,5,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (2,6,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (2,9,now());
INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (3,1,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (3,3,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (3,7,now()); INSERT INTO FILMS_GENRE_MAPPING(films_id,genre_id,last_modified) VALUES (3,8,now()); |
Assuming with below table relationship:
- FILMS and DIRECTED_BY is one-to-one mapping
- FILMS and GENRE is one-to-many mapping
You may need to update SOLR’s timezone by changing C:/{your_installed_path}/solr-5.4.1/bin/solr.cmd to your timezone:
Then, we need to setup data-import handler, please update C:/{your_installed_path}/solr-5.4.1/server/solr/films/conf/solrconfig.xml by:
- Adding <lib dir=”${solr.install.dir:../../../..}/dist/” regex=”solr-dataimporthandler-.*\.jar” /> as:
- Adding:
<requestHandler name=”/dataimport” class=”solr.DataImportHandler”>
<lst name=”defaults”>
<str name=”config”>db-data-config.xml</str>
</lst>
</requestHandler>
We need to add one more configure file “db-data-config.xml” to C:/{your_installed_path}/solr-5.4.1/server/solr/films/conf/ :
<dataConfig>
<dataSource type=”JdbcDataSource” driver=”com.mysql.jdbc.Driver” url=”jdbc:mysql://localhost:3306/itblogs” user=”root” password=”root”/> <document> <entity name=”films” pk=”id” query=”select * from FILMS” deltaImportQuery=”select * from FILMS WHERE id=’${dataimporter.delta.id}'” deltaQuery=”select id from FILMS where last_modified > ‘${dataimporter.last_index_time}'”> <field column=”id” name=”id” /> <field column=”name” name=”name” /> <field column=”initial_release_date” name=”initial_release_date” /> <entity name=”director” pk=”id” query=”select name from DIRECTED_BY where id=’${films.direct_by_id}'” deltaQuery=”select id from DIRECTED_BY where last_modified > ‘${dataimporter.last_index_time}'” parentDeltaQuery=”select id from FILMS where id=${director.id}”> <field name=”directed_by” column=”name” /> </entity> <entity name=”films_genre” pk=”films_id” query=”select genre_id from FILMS_GENRE_MAPPING where films_id=’${films.id}'” deltaQuery=”select films_id, genre_id from FILMS_GENRE_MAPPING where last_modified > ‘${dataimporter.last_index_time}'” parentDeltaQuery=”select id from FILMS where id=${films_genre.films_id}”> <entity name=”genre” pk=”id” query=”select name from GENRE where id = ‘${films_genre.genre_id}'” deltaQuery=”select id from GENRE where last_modified > ‘${dataimporter.last_index_time}'” parentDeltaQuery=”select films_id, genre_id from FILMS_GENRE_MAPPING where genre_id=${genre.id}”> <field name=”genre” column=”name” /> </entity> </entity> </entity> </document> </dataConfig>
|
Please note that you may need to ‘user=”root” password=”root”’ for your MySQL database connection.
Finally, please create a folder under C:/{your_installed_path}/solr-5.4.1/server/solr/lib/ and store MySQL JDBC driver Jar file (e.g. mysql-connector-java-X.X.X.jar), you may download it from https://dev.mysql.com/downloads/connector/j/5.0.html .
Restart the SOLR by “solr restart –p 8983”, and go to http://localhost:8983/solr/#/films/dataimport/ . Please note that it has 2 options for Command, full-import and delta-import:
Command | Details |
full-import | It will full import from database.
So, you need to be careful when doing full-import, because it will import all articles, it may take a long time. |
delta-import | It will check the last_modified of the table, and just import article(s) with last_modified >= ${dataimporter.last_index_time} .
SOLR will store ${dataimporter.last_index_time} internally, and it will be used for next delta-import. Every import, SOLR will update this value. |
Please note if enabling the option “Clean”, SOLR will clean all articles no matter full-import or delta-import execution.
Now, you can try both full-import and delta-import the data. However, for the above db-data-config.xml example, when someone deletes some from records from FILMS, FILMS_GENRE_MAPPING or DIRECTED_BY tables, it will not update the SOLR’s data when triggering delta-import. You can fix this problem by updating FILMS table è last_modified field when your system deletes some records from FILMS_GENRE_MAPPING or DIRECTED_BY.
For FILMS_GENRE_MAPPING and DIRECTED_BY case:
When your system make this query : “DELETE FROM FILMS_GENRE_MAPPING WHERE films_id=1 and genre_id=4;” , it need to make this request as well : “UPDATE FILMS SET last_modified=now() WHERE id=1”, so that delta-import will import the FILMS record (id=1) again into SOLR.
For FILMS and DIRECTED_BY cases:
You can use “deletePkQuery”:
- Create a deleted key table “DELETED_DIRECTED_BY“ like
CREATE TABLE DELETED_DIRECTED_BY(
direct_by_id BIGINT NOT NULL, last_modified TIMESTAMP NOT NULL ); |
- When your system deletes a record from DIRECTED_BY table, e.g. id=1.
- Then, it need insert a record into DELETED_DIRECTED_BY as “INSERT INTO DELETED_DIRECTED_BY (direct_by_id, last_modified) values (1, now());”
- And the db-data-config.xml should update as:
<entity name=”director” pk=”id”
query=”select name from DIRECTED_BY where id=’${films.direct_by_id}'”
deltaQuery=”select id from DIRECTED_BY where last_modified > ‘${dataimporter.last_index_time}'”
parentDeltaQuery=”select id from FILMS where id=${director.id}”
deletedPkQuery=”SELECT direct_by_id as id FROM DELETED_DIRECTED_BY WHERE deleted_at > ‘${dataimporter.last_index_time}'”>
<field name=”directed_by” column=”name” />
</entity>
Besides, importing from database, it can also import from another SOLR server, TIKA and etc, you can find more details from example C:/{your_installed_path}/solr-5.4.1/example-DIH/solr/ .
OK, data is imported into SOLR now, we can try some NoSQL query.